LLVM no longer honours -stack-alignment=4 #21809

llvmbot · 2014-10-31T21:16:17Z


Bugzilla Link	21435
Resolution	FIXED
Resolved on	Nov 17, 2015 16:49
Version	trunk
OS	All
Attachments	bug1347703.ll
Reporter	LLVM Bugzilla Contributor
CC	@asl,@rnk

Extended Description

LLVM stoped honouring

This can easily be reproduced with the attached bug1347703.ll doing

$ llc -o - -mtriple=i686-pc-mingw32 -mattr=+avx -stack-alignment=4 bug1347703.ll | grep '<and.*[er]sp'
$

which produces nothing showing that the stack pointer is never aligned.

However if one disables frame-pointer elimination then it works again:

$ /home/jfonseca/work/vmware/llvm/llvm/build/linux-x86_64/Debug+Asserts/bin/llc -disable-fp-elim -o - -mtriple=i686-pc-mingw32 -mattr=+avx -stack-alignment=4 bug1347703.ll | grep '<and.*[er]sp'
andl $-32, %esp
andl $-32, %esp

$

Which doesn't seem the right behavior -- using the frame-pointer to align the stack seems an implementation detail -- there are probably other ways, and if frame-pointer is indeed necessary it should be automatically employed as necessary.

This regression happened between LLVM release 3.2 and release 3.3. I strongly suspect r168627 -- llvm-mirror/llvm@1243922

The text was updated successfully, but these errors were encountered:

rnk · 2014-10-31T22:23:53Z

Based on the input IR and what I've seen of the backend, this looks like some kind of phase ordering problem.

The input test case has no allocas, so we don't think we need to realign the stack for any highly aligned allocations. Then register allocation (or vector legalization) runs, and we are forced to spill some large vector values. This introduces a highly aligned stack allocation, but it's already too late, we can't go back and add the realignment prologue.

See this awesome comment in X86RegisterInfo about it maybe being too late to realign the stack:

bool X86RegisterInfo::canRealignStack(const MachineFunction &MF) const {
if (MF.getFunction()->hasFnAttribute("no-realign-stack"))
return false;

const MachineFrameInfo *MFI = MF.getFrameInfo();
const MachineRegisterInfo *MRI = &MF.getRegInfo();

// Stack realignment requires a frame pointer. If we already started
// register allocation with frame pointer elimination, it is too late now.
if (!MRI->canReserveReg(FramePtr))
return false;

// If a base pointer is necessary. Check that it isn't too late to reserve
// it.
if (MFI->hasVarSizedObjects())
return MRI->canReserveReg(BasePtr);
return true;
}

Even if that codepath isn't executing, it's possible that there is a similar issue.

llvmbot · 2014-10-31T22:33:08Z

CC'ing Jim and Bob on this one... (as they can see the radar) and should be able to chime in on this one.

IIRC, this was some refactoring I did with Jakob, but I don't recall the full details. It is/was a kind of phase ordering issue, but we never came up with a good solution.

rnk · 2014-10-31T22:38:57Z

I think we could conservatively assume that any SSA value will need to be spilled to the stack and align the current frame to handle at least that.

llvmbot · 2014-10-31T23:35:42Z

Reed, your proposal is to revert r168627, correct? It's conservatively correct, but you waste a register is many cases. I would prefer a more targeted approach.

Any reason we shouldn't go with Jose's suggestion: reserve the frame pointer to enable stack-realignment, if the user specifies an alignment larger than the default stack alignment?

llvmbot · 2014-10-31T23:56:14Z

Reed, your proposal is to revert r168627, correct? It's conservatively
correct, but you waste a register is many cases. I would prefer a more
targeted approach.

Any reason we shouldn't go with Jose's suggestion: reserve the frame pointer
to enable stack-realignment, if the user specifies an alignment larger than
the default stack alignment?

To be clear -- my suggestion was enable frame pointer when needed.

I'm not sure that enabling when the user specifies a stack alignment different from default stack alignment covers all cases. Imagine that the function needs to spill AVX 32-byte registers. Wouldn't that require a 32-byte aligned stack pointer?

Or what if the function needed to call another function passing a 32-byte vector by reference.

In other words, it seems to me that the alternatives are to revert r168627, or disable frame pointer omission blindly...

rnk · 2014-10-31T23:59:28Z

Reed, your proposal is to revert r168627, correct? It's conservatively
correct, but you waste a register is many cases. I would prefer a more
targeted approach.

Any reason we shouldn't go with Jose's suggestion: reserve the frame pointer
to enable stack-realignment, if the user specifies an alignment larger than
the default stack alignment?

I feel like if the default stack alignment is 4 that should do the same thing as the user specifying a stack alignment of 4, right?

llvmbot · 2014-11-03T16:44:51Z

Reed, your proposal is to revert r168627, correct? It's conservatively
correct, but you waste a register is many cases. I would prefer a more
targeted approach.

Any reason we shouldn't go with Jose's suggestion: reserve the frame pointer
to enable stack-realignment, if the user specifies an alignment larger than
the default stack alignment?

To be clear -- my suggestion was enable frame pointer when needed.

If the FP isn't available/reserved, then register allocator uses unaligned loads/stores to fill/spill. Thus, the FP isn't strictly needed for correctness in the cases you've described.

llvmbot · 2014-11-03T16:54:29Z

Reed, your proposal is to revert r168627, correct? It's conservatively
correct, but you waste a register is many cases. I would prefer a more
targeted approach.

Any reason we shouldn't go with Jose's suggestion: reserve the frame pointer
to enable stack-realignment, if the user specifies an alignment larger than
the default stack alignment?

I feel like if the default stack alignment is 4 that should do the same
thing as the user specifying a stack alignment of 4, right?

IIRC, the default stack alignment on x86 is 16 bytes (i.e., -stack-alignment=2). If the user specifies a larger alignment than the default then we might consider reserving the FP and realigning the stack to the larger non-default alignment.

llvmbot · 2014-11-03T17:13:03Z

IIRC, the default stack alignment on x86 is 16 bytes

It depends on the platforms. The default stack alignment on x86 on Windows is 4-bytes.

In fact it seems this bus is mis-titled: it seems the problem is not that LLVM doesn't honor -stack-alignment=, but rather that x86 backend is broken for stack alignments (default or overriden) less than 16 bytes.

llvmbot · 2014-11-03T17:23:06Z

If the FP isn't available/reserved, then register allocator uses unaligned
loads/stores to fill/spill. Thus, the FP isn't strictly needed for
correctness in the cases you've described.

If I try the attached input, targetting x86_64 linux with the default stack alignment (which is 16bytes), and grep for rsp I get

$ llc-3.4 -o - -mtriple=x86_64-unknown-linux -mattr=+avx bug1347703.ll | grep rsp
subq $40, %rsp
movq %r8, 32(%rsp) # 8-byte Spill
movq %rdi, 24(%rsp) # 8-byte Spill
movq 96(%rsp), %rdi
vmovdqu %ymm1, -32(%rsp) # 32-byte Folded Spill
vmovdqu %ymm1, -64(%rsp) # 32-byte Folded Spill
vmovdqu %ymm1, -96(%rsp) # 32-byte Folded Spill
movq 24(%rsp), %rdi # 8-byte Reload
vandps -32(%rsp), %ymm3, %ymm3 # 32-byte Folded Reload
vandps -64(%rsp), %ymm10, %ymm10 # 32-byte Folded Reload
vandps -96(%rsp), %ymm10, %ymm10 # 32-byte Folded Reload
[...]

The spills are indeed with VMOVDQU, but doesn't VANDPS require the m256 argument to be 32-byte aligned?

llvmbot · 2014-11-03T18:49:49Z

It has been a while and I don't remember the numbers, but I'm pretty sure that reverting r168627 would cause a significant performance regression. We should figure out how to fix the problems with -stack-alignment=4 some other way.

rnk · 2014-11-03T19:05:34Z

IIRC, the default stack alignment on x86 is 16 bytes

It depends on the platforms. The default stack alignment on x86 on Windows
is 4-bytes.

This is why I care. :)

I believe there are two workarounds for Jose:

Use the alignstack(16) (or 32?) attribute on the functions in question. If you control the frontend, this is the easiest and least fragile thing to do.
Use the -force-align-stack hidden x86 backend command line option. This is probably fragile and will break if and when we re-examine flags embedded in libraries like this.

If you can do 1, we can probably let this issue stand.

llvmbot · 2014-11-03T21:13:54Z

I believe there are two workarounds for Jose:

Use the alignstack(16) (or 32?) attribute on the functions in question.
If you control the frontend, this is the easiest and least fragile thing to
do.

The workaround we took was to set TargetOptions::NoFramePointerElim and TargetOptions::NoFramePointerElimNonLeaf on all x86 and x86_64 builds of Mesa. We were already doing this for non-release builds (to facilitate stack back traces when debugging/profiling), but after this issue it is clear that frame-pointer-omission is not just a minor optimization in LLVM, but has deep-reaching implications on X86 code generation, so I'm no longer comfortable in having (non-)release builds differing on this matter, as most of our automated tests actually run with non-release builds.

Use the -force-align-stack hidden x86 backend command line option. This
is probably fragile and will break if and when we re-examine flags embedded
in libraries like this.

If you can do 1, we can probably let this issue stand.

We generate LLVM IR directly (Mesa llvmpipe driver), and I'm used to tweak things in order to get LLVM, and can do so.

But, for whatever my opinion is worth, I have to say this seems a serious flaw in LLVM x86 backend; and taking an approach that works 99% of cases for the sake of performance, instead of one that works 100% seems the wrong trade-off for a compiler; and if there are indeed possible workarounds then I fail to understand why pushing the workarounds to the user instead of automatically detecting the problematic situations and deploy the workarounds without intervention; at very least you should add an assert/error message when this happens -- anything less is just sweeping the problem under the rug...

llvmbot · 2014-11-05T02:41:55Z

Reed, your proposal is to revert r168627, correct? It's conservatively
correct, but you waste a register is many cases. I would prefer a more
targeted approach.

Any reason we shouldn't go with Jose's suggestion: reserve the frame pointer
to enable stack-realignment, if the user specifies an alignment larger than
the default stack alignment?

To be clear -- my suggestion was enable frame pointer when needed.

If the FP isn't available/reserved, then register allocator uses unaligned
loads/stores to fill/spill. Thus, the FP isn't strictly needed for
correctness in the cases you've described.

But, in this case, it would be forbidden for those fill/spill loads/stores (or maybe can only happen for loads) to get eliminated and folded into the arithmetic ops, though only for sse, not when using avx (as vex encoding can handle unaligned loads in ordinary arithmetic ops whereas sse can't).
However it seems a bit silly to use unaligned stores/loads for spills imho in any case. I'm aware newer cpus don't necessarily have much of a performance penalty for that (intel core ones in particular) though on some the performance impact can be rather horrendous (anyone still using a p4?).

rnk · 2015-11-17T19:38:18Z

This is over a year old and the attached LLVM IR no longer parses with TOT llc.

llc's command line interface isn't exactly stable or user facing either, so I don't think there's anything to do here.

llvmbot · 2015-11-17T20:33:30Z

This is over a year old and the attached LLVM IR no longer parses with TOT
llc.

llc's command line interface isn't exactly stable or user facing either, so
I don't think there's anything to do here.

This bug is not specific llc...

I provided the llc command line / IR merely to facilitate reproing and fixing the bug.

We saw the bug on Mesa + llvmpipe with JIT.

The problem here is that
LLVM is broken when incoming stack aligment is 4. You can probably repro the isse with clang -mstack-alignment=4 and the right input.

That said, it's clear that nothing happened. So probably nothing will ever happen. On Mac and Linux the ABI (both 32 and 64bits) states that stack alignment is always aligned 16bytes, so this is never a issue. (Maybe it matters for Microsoft, since on Windows 32 bits, the stack alignment is 4bytes.)

But, if you don't want to fix, then at very least mark it as WONTFIX. Closing as INVALID on the grounds you can't no longer repro is IMO insult to the time I spent on obtaining a small repro, as per the bug guidelines.

LLVM has very little stable interfaces, so if LLVM devs sleep it long enough you can close any bug on the basis you can repro...

llvmbot · 2015-11-17T20:46:00Z

bug1347703-r253245.ll
Updated IR, correct as of r253245.

llvmbot · 2015-11-17T20:50:10Z

Here. Updated LLVM IR.

Basically took the original IR and massaged with vim subst commands.

:%s/getelementptr ({.})*/getelementptr \1, \1
:%s/load (.)* /load \1, \1
:%s/getelementptr (\w+)*/getelementptr \1, \1*

rnk · 2015-11-17T21:19:47Z

But, if you don't want to fix, then at very least mark it as WONTFIX.
Closing as INVALID on the grounds you can't no longer repro is IMO insult to
the time I spent on obtaining a small repro, as per the bug guidelines.

Sorry, our bugzilla resolution statuses are really old and sometimes rude. I end up using "invalid" to mean "working as intended". I'm only trying to convey that we aren't taking action because:

There isn't a correctness issue, since we'll used unaligned spills and fills
Nobody has updated the issue in a year, indicating that very few people rely on the stack alignment TargetOption

It sounds like it's still important to you and you don't like your workaround, so we can reopen it.

To resummarize what's wrong, LLVM is making a bad performance decision: it is choosing to use unaligned vector spills in order to omit the frame pointer. In Chad's case, this helped, but in your case, it hurts. Is that accurate?

llvmbot · 2015-11-17T22:06:38Z

To resummarize what's wrong, LLVM is making a bad performance decision: it
is choosing to use unaligned vector spills in order to omit the frame
pointer. In Chad's case, this helped, but in your case, it hurts. Is that
accurate?
The bigger issue was that llvm used to produce incorrect code, because the vector spills/reloads were unaligned but using instructions which required alignment to do it. Now with AVX (which not just has unaligned movs like SSE, but where the arithmetic instructions don't need alignment) this is less of an issue (can still fold the reloads into the arithmetic ops, I just hope it doesn't do it any longer if only SSE is available).
But, our code will produce quite massive amounts of reloads/spills (of vector regs) sometimes and certainly using unaligned instructions for that sounds like an odd decision. It's true newer cpus "usually" don't really have huge penalties for that, but that still goes against every optimization manual I've ever seen. You really typically only should use unaligned data if you can't control data alignment easily. But certainly for spills/reloads, doing it unaligned makes no sense whatsoever.

rnk · 2015-11-17T22:51:46Z

I agree, in general, doing unaligned vector loads seems pretty lame. I think in Chad's test case, the vector spills were not in the hot path, and it was on 32-bit x86, where going from 6 to 7 GPRs makes a big difference.

llvmbot · 2015-11-17T23:21:15Z

I agree, in general, doing unaligned vector loads seems pretty lame. I think
in Chad's test case, the vector spills were not in the hot path, and it was
on 32-bit x86, where going from 6 to 7 GPRs makes a big difference.
In our case, we don't have much use for scalar regs :-). But in any case, there just has to be a better solution than just using unaligned vector loads/stores for vector spills (on 32bit, you don't have too many vector regs neither).

llvmbot · 2015-11-18T00:45:14Z

It sounds like it's still important to you and you don't like your
workaround, so we can reopen it.

Thanks.

To resummarize what's wrong, LLVM is making a bad performance decision: it
is choosing to use unaligned vector spills in order to omit the frame
pointer. In Chad's case, this helped, but in your case, it hurts. Is that
accurate?

When I filled this bug this was a crash, not performance issue.

The 1347703 is the number of a VMware internal bug report, and I've just doubled checked: this was causing crashes on Windows, in 32bits processes, both on AVX and non-AVX systems.

But I've just retried, and the crash/correctness issue might have indeed be fixed in LLVM since this was filed.

For example, if I do

llc -o - -mtriple=i686-pc-mingw32 -mattr=-avx,+sse2 -stack-alignment=4 bug1347703.ll

then all accesses to the stack are with unaligned MOVs. This was not the case before.

I have all llc versions from 3.3 to 3.6, and indeed it seems the issue was fixed going from LLVM 3.3 to 3.4.

Performance was never my worry here. Even with the crash fixed, it makes sense for us to continue using frame pointer. My main concern was the crash.

So it looks you can resolve this bug after all!

rnk · 2015-11-18T00:49:00Z

Performance was never my worry here. Even with the crash fixed, it makes
sense for us to continue using frame pointer. My main concern was the crash.

So it looks you can resolve this bug after all!

OK, thanks for looking into it.

I think we'll mark this fixed then. If someone runs into the performance issue, there are tons of workarounds: pass -fno-omit-frame-pointer, -mstackrealign, -mstack-alignment=32, or whatever. Those all use a different codepath from TargetOptions::StackAlignmentOverride, and should avoid the unaligned loads.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM no longer honours -stack-alignment=4 #21809

LLVM no longer honours -stack-alignment=4 #21809

llvmbot commented Oct 31, 2014

rnk commented Oct 31, 2014

llvmbot commented Oct 31, 2014

rnk commented Oct 31, 2014

llvmbot commented Oct 31, 2014

llvmbot commented Oct 31, 2014

rnk commented Oct 31, 2014

llvmbot commented Nov 3, 2014

llvmbot commented Nov 3, 2014

llvmbot commented Nov 3, 2014

llvmbot commented Nov 3, 2014

llvmbot commented Nov 3, 2014

rnk commented Nov 3, 2014

llvmbot commented Nov 3, 2014

llvmbot commented Nov 5, 2014

rnk commented Nov 17, 2015

llvmbot commented Nov 17, 2015

llvmbot commented Nov 17, 2015

llvmbot commented Nov 17, 2015

rnk commented Nov 17, 2015

llvmbot commented Nov 17, 2015

rnk commented Nov 17, 2015

llvmbot commented Nov 17, 2015

llvmbot commented Nov 18, 2015

rnk commented Nov 18, 2015

LLVM no longer honours -stack-alignment=4 #21809

LLVM no longer honours -stack-alignment=4 #21809

Comments

llvmbot commented Oct 31, 2014

Extended Description

rnk commented Oct 31, 2014

llvmbot commented Oct 31, 2014

rnk commented Oct 31, 2014

llvmbot commented Oct 31, 2014

llvmbot commented Oct 31, 2014

rnk commented Oct 31, 2014

llvmbot commented Nov 3, 2014

llvmbot commented Nov 3, 2014

llvmbot commented Nov 3, 2014

llvmbot commented Nov 3, 2014

llvmbot commented Nov 3, 2014

rnk commented Nov 3, 2014

llvmbot commented Nov 3, 2014

llvmbot commented Nov 5, 2014

rnk commented Nov 17, 2015

llvmbot commented Nov 17, 2015

llvmbot commented Nov 17, 2015

llvmbot commented Nov 17, 2015

rnk commented Nov 17, 2015

llvmbot commented Nov 17, 2015

rnk commented Nov 17, 2015

llvmbot commented Nov 17, 2015

llvmbot commented Nov 18, 2015

rnk commented Nov 18, 2015