Running the following test case: target datalayout = "E-m:e-i64:64-n32:64" target triple = "powerpc64-unknown-linux-gnu" declare void @func(i8*, i64, i64) define void @test(i8* %context, i32** %elementArrayPtr, i32 %value) { entry: %cmp = icmp eq i32 %value, 0 br i1 %cmp, label %lreturn, label %lnext lnext: %elementArray = load i32*, i32** %elementArrayPtr, align 8 %element = load i32, i32* %elementArray, align 4 %element.ext = zext i32 %element to i64 %value.ext = zext i32 %value to i64 call void @func(i8* %context, i64 %value.ext, i64 %element.ext) br label %lreturn lreturn: ret void } on powerpc64-linux using the command line: llc -optimize-regalloc=false -regalloc=fast results in the following incorrect assembler output: # BB#1: # %lnext ld 3, 128(1) # 8-byte Folded Reload lwz 6, 124(1) # 4-byte Folded Reload # implicit-def: %X12 ld 4, 0(3) ld 3, 136(1) # 8-byte Folded Reload lwz 5, 0(4) mr 4, 6 clrldi 4, 12, 32 bl func nop Note how the "lwz 6, 124(1)" is marked as implicit-def: %X12, which is of course incorrect, and later how the clrldi uses register 12, which is actually undefined at this point. Looking at the MI dumps, the invalid use of %X12 is introduced by the post-RA scheduling pass, presumably via the anti-dependency breaker. Before the scheduler, we have: %X3<def> = LD 128, %X1; mem:LD8[FixedStack1] %X4<def> = LD 0, %X3<kill>; mem:LD8[%elementArrayPtr] %X5<def> = LWZ8 0, %X4<kill>; mem:LD4[%elementArray] %X4<def> = IMPLICIT_DEF %R6<def> = LWZ 124, %X1; mem:LD4[FixedStack2] %R4<def> = OR %R6, %R6<kill> %X4<def> = RLDICL %X4<kill>, 0, 32 %X3<def> = LD 136, %X1; mem:LD8[FixedStack0] BL8_NOP <ga:@func> and afterwards we see: %X3<def> = LD 128, %X1; mem:LD8[FixedStack1] %R6<def> = LWZ 124, %X1; mem:LD4[FixedStack2] %X12<def> = IMPLICIT_DEF %X4<def> = LD 0, %X3<kill>; mem:LD8[%elementArrayPtr] %X3<def> = LD 136, %X1; mem:LD8[FixedStack0] %X5<def> = LWZ8 0, %X4<kill>; mem:LD4[%elementArray] %R4<def> = OR %R6<kill>, %R6<kill> %X4<def> = RLDICL %X12<kill>, 0, 32 BL8_NOP <ga:@func> It would appear that the dependency breaker renamed the use-def chain of %X4 from IMPLICIT_DEF to the RLDICL to use %X12 instead, without noticing that there is a def of %R4 in between (which is a subreg of %X4). It's not fully clear to me whether use of the "fast" register allocator is necessary in general to trigger the bug; however, I didn't find any test case using the default register allocator that would create a register usage pattern before the post-RA scheduler that shows the issue.
It seems odd that it would be renaming inputs to the IMPLICIT_DEF. Should AADB have the following condition as well? Index: lib/CodeGen/AggressiveAntiDepBreaker.cpp =================================================================== --- lib/CodeGen/AggressiveAntiDepBreaker.cpp (revision 262322) +++ lib/CodeGen/AggressiveAntiDepBreaker.cpp (working copy) @@ -366,7 +366,7 @@ // reference either system calls or the register directly. Skip it until we // can tell user specified registers from compiler-specified. if (MI.isCall() || MI.hasExtraDefRegAllocReq() || TII->isPredicated(MI) || - MI.isInlineAsm()) { + MI.isInlineAsm() || MI.isImplicitDef()) { DEBUG(if (State->GetGroup(Reg) != 0) dbgs() << "->g0(alloc-req)"); State->UnionGroups(Reg, 0); } It appears to fix the problem, but I'm not sure if this is the right way to go. Just because it appears reasonable to me, doesn't mean it is.
I agree that this would fix the problem, but I don't think it is the right solution. The proposed change would lock implicitly defined values to their registers, but I think the real problem is that when X4 is replaced with X12, R4 is not replaced with R12. Does that make sense?
From the debug output, it looks like the groups get unified correctly: Anti: %X4<def> = RLDICL %X4<kill>, 0, 32 Def Groups: X4=g0->g0(via R4) Antidep reg: X4 (real dependency) Use Groups: X4=g0->g323(last-use) R4->g324(last-use) Anti: %R4<def> = OR %R6, %R6<kill> Def Groups: R4=g324->g323(via X4) Antidep reg: R4 Use Groups: R6=g0->g325(last-use) R6=g325 Anti: %R6<def> = LWZ 124, %X1; mem:LD4[FixedStack2] Def Groups: R6=g325 Use Groups: X1=g321 Anti: %X4<def> = IMPLICIT_DEF Dead Def: X4 R4->g326 Def Groups: X4=g323->g326(via R4) Antidep reg: X4 Rename Candidates for Group g326: X4: G8RC :: X0 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 AllocationOrder(G8RC_and_G8RC_NOX0) = [ %X3 %X4 %X5 %X6 %X7 %X8 %X9 %X10 %X11 %X12 %X30 %X29 %X28 %X27 %X26 %X25 %X24 %X23 %X22 %X21 %X20 %X19 %X18 %X17 %X16 %X15 %X14 %X31 ] Find Registers: [X31: X31(live)] [X14: X14(live)] [X15: X15(live)] [X16: X16(live)] [X17: X17(live)] [X18: X18(live)] [X19: X19(live)] [X20: X20(live)] [X21: X21(live)] [X22: X22(live)] [X23: X23(live)] [X24: X24(live)] [X25: X25(live)] [X26: X26(live)] [X27: X27(live)] [X28: X28(live)] [X29: X29(live)] [X30: X30(live)] [X12: X12] Breaking anti-dependence edge on X4: X4->X12(2 refs) Use Groups: So g326 has been unified with g323 (which is R4). But it then does not update that register. I suspect that this code: DEBUG(dbgs() << "\tBreaking anti-dependence edge on " << TRI->getName(AntiDepReg) << ":"); // Handle each group register... for (std::map<unsigned, unsigned>::iterator S = RenameMap.begin(), E = RenameMap.end(); S != E; ++S) { unsigned CurrReg = S->first; unsigned NewReg = S->second; DEBUG(dbgs() << " " << TRI->getName(CurrReg) << "->" << TRI->getName(NewReg) << "(" << RegRefs.count(CurrReg) << " refs)"); // Update the references to the old register CurrReg to // refer to the new register NewReg. for (const auto &Q : make_range(RegRefs.equal_range(CurrReg))) { Q.second.Operand->setReg(NewReg); // If the SU for the instruction being updated has debug // information related to the anti-dependency register, make // sure to update that as well. const SUnit *SU = MISUnitMap[Q.second.Operand->getParent()]; if (!SU) continue; for (DbgValueVector::iterator DVI = DbgValues.begin(), DVE = DbgValues.end(); DVI != DVE; ++DVI) if (DVI->second == Q.second.Operand->getParent()) UpdateDbgValue(*DVI->first, AntiDepReg, NewReg); } might need a loop over subregisters here so that they get renamed with the parent register.
I have posted a patch to phabricator for this issue (http://reviews.llvm.org/D18448).
This bug is actually caused by the same issue as PR 25503 which is important for LLVM Pipe/MESA. We need this fix ported into 3.8 since MESA requires this fix. Thanks for the fix Thomas and thanks for opening this bug Uli.
Committed r265097 (On behalf of Tom)