LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 26775 - Wrong code generation in aggressive anti-dependency breaker
Summary: Wrong code generation in aggressive anti-dependency breaker
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Common Code Generator Code (show other bugs)
Version: trunk
Hardware: Other Linux
: P normal
Assignee: Thomas B. Jablin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-29 11:50 PST by Ulrich Weigand
Modified: 2016-03-31 21:10 PDT (History)
6 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ulrich Weigand 2016-02-29 11:50:08 PST
Running the following test case:

target datalayout = "E-m:e-i64:64-n32:64"
target triple = "powerpc64-unknown-linux-gnu"

declare void @func(i8*, i64, i64)

define void @test(i8* %context, i32** %elementArrayPtr, i32 %value) {
entry:
  %cmp = icmp eq i32 %value, 0
  br i1 %cmp, label %lreturn, label %lnext

lnext:
  %elementArray = load i32*, i32** %elementArrayPtr, align 8
  %element = load i32, i32* %elementArray, align 4
  %element.ext = zext i32 %element to i64
  %value.ext = zext i32 %value to i64
  call void @func(i8* %context, i64 %value.ext, i64 %element.ext)
  br label %lreturn

lreturn:
  ret void
}


on powerpc64-linux using the command line: 
  llc -optimize-regalloc=false -regalloc=fast


results in the following incorrect assembler output:

# BB#1:                                 # %lnext
        ld 3, 128(1)                    # 8-byte Folded Reload
        lwz 6, 124(1)                   # 4-byte Folded Reload
                                        # implicit-def: %X12
        ld 4, 0(3)
        ld 3, 136(1)                    # 8-byte Folded Reload
        lwz 5, 0(4)
        mr 4, 6
        clrldi   4, 12, 32
        bl func
        nop

Note how the "lwz 6, 124(1)" is marked as implicit-def: %X12,
which is of course incorrect, and later how the clrldi uses
register 12, which is actually undefined at this point.


Looking at the MI dumps, the invalid use of %X12 is introduced by the post-RA scheduling pass, presumably via the anti-dependency breaker.  Before the scheduler, we have:
        %X3<def> = LD 128, %X1; mem:LD8[FixedStack1]
        %X4<def> = LD 0, %X3<kill>; mem:LD8[%elementArrayPtr]
        %X5<def> = LWZ8 0, %X4<kill>; mem:LD4[%elementArray]
        %X4<def> = IMPLICIT_DEF
        %R6<def> = LWZ 124, %X1; mem:LD4[FixedStack2]
        %R4<def> = OR %R6, %R6<kill>
        %X4<def> = RLDICL %X4<kill>, 0, 32
        %X3<def> = LD 136, %X1; mem:LD8[FixedStack0]
        BL8_NOP <ga:@func>
and afterwards we see:
        %X3<def> = LD 128, %X1; mem:LD8[FixedStack1]
        %R6<def> = LWZ 124, %X1; mem:LD4[FixedStack2]
        %X12<def> = IMPLICIT_DEF
        %X4<def> = LD 0, %X3<kill>; mem:LD8[%elementArrayPtr]
        %X3<def> = LD 136, %X1; mem:LD8[FixedStack0]
        %X5<def> = LWZ8 0, %X4<kill>; mem:LD4[%elementArray]
        %R4<def> = OR %R6<kill>, %R6<kill>
        %X4<def> = RLDICL %X12<kill>, 0, 32
        BL8_NOP <ga:@func>

It would appear that the dependency breaker renamed the use-def chain of %X4 from IMPLICIT_DEF to the RLDICL to use %X12 instead, without noticing that there is a def of %R4 in between (which is a subreg of %X4).

It's not fully clear to me whether use of the "fast" register allocator is necessary in general to trigger the bug; however, I didn't find any test case using the default register allocator that would create a register usage pattern before the post-RA scheduler that shows the issue.
Comment 1 Nemanja Ivanovic 2016-03-23 13:32:28 PDT
It seems odd that it would be renaming inputs to the IMPLICIT_DEF. Should AADB have the following condition as well?

Index: lib/CodeGen/AggressiveAntiDepBreaker.cpp
===================================================================
--- lib/CodeGen/AggressiveAntiDepBreaker.cpp    (revision 262322)
+++ lib/CodeGen/AggressiveAntiDepBreaker.cpp    (working copy)
@@ -366,7 +366,7 @@
     // reference either system calls or the register directly. Skip it until we
     // can tell user specified registers from compiler-specified.
     if (MI.isCall() || MI.hasExtraDefRegAllocReq() || TII->isPredicated(MI) ||
-        MI.isInlineAsm()) {
+        MI.isInlineAsm() || MI.isImplicitDef()) {
       DEBUG(if (State->GetGroup(Reg) != 0) dbgs() << "->g0(alloc-req)");
       State->UnionGroups(Reg, 0);
     }


It appears to fix the problem, but I'm not sure if this is the right way to go. Just because it appears reasonable to me, doesn't mean it is.
Comment 2 Thomas B. Jablin 2016-03-23 14:25:11 PDT
I agree that this would fix the problem, but I don't think it is the right solution. The proposed change would lock implicitly defined values to their registers, but I think the real problem is that when X4 is replaced with X12, R4 is not replaced with R12. Does that make sense?
Comment 3 Hal Finkel 2016-03-23 16:11:31 PDT
From the debug output, it looks like the groups get unified correctly:

Anti:   %X4<def> = RLDICL %X4<kill>, 0, 32
        Def Groups: X4=g0->g0(via R4)
        Antidep reg: X4 (real dependency)
        Use Groups: X4=g0->g323(last-use) R4->g324(last-use)
Anti:   %R4<def> = OR %R6, %R6<kill>
        Def Groups: R4=g324->g323(via X4)
        Antidep reg: R4 Use Groups: R6=g0->g325(last-use) R6=g325
Anti:   %R6<def> = LWZ 124, %X1; mem:LD4[FixedStack2]
        Def Groups: R6=g325
        Use Groups: X1=g321
Anti:   %X4<def> = IMPLICIT_DEF
        Dead Def: X4 R4->g326
        Def Groups: X4=g323->g326(via R4)
        Antidep reg: X4
        Rename Candidates for Group g326:
                X4: G8RC :: X0 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31
AllocationOrder(G8RC_and_G8RC_NOX0) = [ %X3 %X4 %X5 %X6 %X7 %X8 %X9 %X10 %X11 %X12 %X30 %X29 %X28 %X27 %X26 %X25 %X24 %X23 %X22 %X21 %X20 %X19 %X18 %X17 %X16 %X15 %X14 %X31 ]
        Find Registers: [X31: X31(live)] [X14: X14(live)] [X15: X15(live)] [X16: X16(live)] [X17: X17(live)] [X18: X18(live)] [X19: X19(live)] [X20: X20(live)] [X21: X21(live)] [X22: X22(live)] [X23: X23(live)] [X24: X24(live)] [X25: X25(live)] [X26: X26(live)] [X27: X27(live)] [X28: X28(live)] [X29: X29(live)] [X30: X30(live)] [X12: X12]
        Breaking anti-dependence edge on X4: X4->X12(2 refs)
        Use Groups:


So g326 has been unified with g323 (which is R4). But it then does not update that register. I suspect that this code:

          DEBUG(dbgs() << "\tBreaking anti-dependence edge on "
                << TRI->getName(AntiDepReg) << ":");

          // Handle each group register...
          for (std::map<unsigned, unsigned>::iterator
                 S = RenameMap.begin(), E = RenameMap.end(); S != E; ++S) {
            unsigned CurrReg = S->first;
            unsigned NewReg = S->second;

            DEBUG(dbgs() << " " << TRI->getName(CurrReg) << "->" <<
                  TRI->getName(NewReg) << "(" <<
                  RegRefs.count(CurrReg) << " refs)");

            // Update the references to the old register CurrReg to
            // refer to the new register NewReg.
            for (const auto &Q : make_range(RegRefs.equal_range(CurrReg))) {
              Q.second.Operand->setReg(NewReg);
              // If the SU for the instruction being updated has debug
              // information related to the anti-dependency register, make
              // sure to update that as well.
              const SUnit *SU = MISUnitMap[Q.second.Operand->getParent()];
              if (!SU) continue;
              for (DbgValueVector::iterator DVI = DbgValues.begin(),
                     DVE = DbgValues.end(); DVI != DVE; ++DVI)
                if (DVI->second == Q.second.Operand->getParent())
                  UpdateDbgValue(*DVI->first, AntiDepReg, NewReg);
            }


might need a loop over subregisters here so that they get renamed with the parent register.
Comment 4 Thomas B. Jablin 2016-03-24 10:41:47 PDT
I have posted a patch to phabricator for this issue (http://reviews.llvm.org/D18448).
Comment 5 Nemanja Ivanovic 2016-03-25 09:13:22 PDT
This bug is actually caused by the same issue as PR 25503 which is important for LLVM Pipe/MESA. We need this fix ported into 3.8 since MESA requires this fix.

Thanks for the fix Thomas and thanks for opening this bug Uli.
Comment 6 cycheng 2016-03-31 21:10:09 PDT
Committed r265097 (On behalf of Tom)