25503 – Combination of VSX and VMX (Altivec) cause various OpenGL tests to fail in mesa/llvmpipe on ppc64le

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 25503 - Combination of VSX and VMX (Altivec) cause various OpenGL tests to fail in mesa/llvmpipe on ppc64le

Summary: Combination of VSX and VMX (Altivec) cause various OpenGL tests to fail in me...

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Backend: PowerPC (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P normal
Assignee:	Nemanja Ivanovic

URL:
Keywords:	miscompilation

Depends on:
Blocks:

Reported:	2015-11-12 03:22 PST by Oded Gabbay
Modified:	2016-04-26 08:08 PDT (History)
CC List:	5 users (show)

See Also:
Fixed By Commit(s):

Attachments
LLVM IR for original code, with instcombine and vmaxfp (99.36 KB, application/octet-stream) 2015-11-12 03:22 PST, Oded Gabbay	Details
Generated assembly for original code, with instcombine and vmaxfp (67.07 KB, application/octet-stream) 2015-11-12 03:23 PST, Oded Gabbay	Details
LLVM IR for modified code, without instcombine but with vmaxfp (99.36 KB, application/octet-stream) 2015-11-12 03:23 PST, Oded Gabbay	Details
Generated assembly for modified code, without instcombine but with vmaxfp (68.93 KB, application/octet-stream) 2015-11-12 03:24 PST, Oded Gabbay	Details
LLVM IR for modified code, with instcombine but without vmaxfp (108.43 KB, application/octet-stream) 2015-11-12 03:24 PST, Oded Gabbay	Details
Generated assembly for modified code, with instcombine but without vmaxfp (68.48 KB, application/octet-stream) 2015-11-12 03:25 PST, Oded Gabbay	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Oded Gabbay 2015-11-12 03:22:19 PST

Created attachment 15272 [details]
LLVM IR for original code, with instcombine and vmaxfp

Hi,
I'm working on enabling graphics (OpenGL) on ppc64le arch. This is done through the llvmpipe driver in the mesa library. The llvmpipe driver converts TGSI shader code into LLVM IR, by calling LLVM API functions that build the IR, then call LLVM API to compile it as JIT function and then call that function when necessary (multiple times).

To test mesa, we use another open source library called piglit, which contains thousands of tests. Currently, I'm working on tests that fail on ppc64le but pass on x86-64 (Haswell). I started with almost 600 such tests and now I'm down to about 10.

I now found that out of those 10, 4 tests can succeed if I disable the use of the instcombine pass. I made no other change in mesa, piglit or llvm code. This means that the generated LLVM IR is the same (with or without the instcombine pass), but the generated assembly is different.

Unfortunately, the generated assembly of graphic shaders is quite large (just under 3400 assembly commands). That's why it is very difficult for me to compare the generated assembly with this pass and without - the generated assembly is very different.

I tried also to produce a "short" standalone IR file and reproduce the
problem with that, but to no avail.

I have two other findings I'd like to point out:

1. I thought that in the meantime, I could disable this pass in
llvmpipe and make those 4 tests pass. Unfrotunately, I found out that
even though it fixes those 4 tests, it causes a regression in about 30
other tests - which is very weird and something I find distributing.

2. There is another option to make those 4 tests pass, and that is to
disable use of the Altivec vmaxfp intrinsic (and instead use a couple of generic
add/sub/shift IR commands). Now, I don't know why, but that fixes those 4 tests. Of course, that intrinsic is used in many many tests (almost
all of them), and most of the tests pass, so the problem is not in this
Altivec intrinsic itself, but probably in some combination of it and other
things.

I have attached 6 files:

- LLVM IR and generated assembly files for original code, with instcombine and vmaxfp

- LLVM IR and generated assembly files for modified code, without instcombine but with vmaxfp

- LLVM IR and generated assembly files for modified code, with instcombine but without vmaxfp

Those files above were generated by mesa (see below on how to re-create them). Because the bug is in a certain variant of the fragment shader, I removed all the other shader codes from the above files because it is not relevant. Therefore, they are shorter then the original dumps.

In addition, I would like to post here the steps to re-create this setup on a POWER8, ppc64le machine. I'm using RHEL 7.2 internal release, but RHEL 7.1LE or Fedora 21/22 ppc64le can be used as well. The important thing is that you need a desktop GUI, such as GNOME, because you need xserver to run.

1. Clone LLVM, mesa and piglit repos:
   1.1 git clone http://llvm.org/git/llvm.git
   1.2 git clone git://anongit.freedesktop.org/mesa/mesa
   1.3 git://anongit.freedesktop.org/git/piglit
   1.4 export LLVM_ROOT=<llvm source folder>
   1.5 export MESA_ROOT=<mesa source folder>
   1.6 export PIGLIT_ROOT=<piglit source folder>
   1.7 export LIBGL_ALWAYS_SOFTWARE=1

2. Build LLVM
   2.1 mkdir ~/myllvmbuild ; cd ~/myllvmbuild
   2.2 $LLVM_ROOT/configure --disable-dependency-tracking --prefix=$HOME/.local --with-extra-ld-options=-Wl,-Bsymbolic,--default-symver --enable-targets=host --enable-bindings=none --enable-debug-runtime --enable-jit --enable-shared --enable-optimized --disable-clang-arcmt --disable-clang-static-analyzer --disable-clang-rewriter --disable-assertions --disable-docs --disable-libffi --disable-terminfo --disable-timestamps

   2.3 make -j8 ; make install

3. Build mesa
   3.1 mkdir ~/mesa_debug_build ; cd ~/mesa_debug_build
   3.2 $MESA_ROOT/autogen.sh --disable-dependency-tracking --prefix=$HOME/.local --enable-selinux --enable-osmesa --with-dri-driverdir=$HOME/.local/lib/dri --enable-egl --disable-gles1 --enable-gles2 --disable-xvmc --disable-dri3 --with-egl-platforms=x11,drm --enable-shared-glapi --enable-gbm --disable-opencl --enable-glx-tls --enable-texture-float=yes --enable-gallium-llvm --enable-llvm-shared-libs --enable-dri --with-gallium-drivers=svga,swrast --with-dri-drivers=swrast --with-llvm-prefix=$HOME/.local --enable-debug CFLAGS="-O0 -ggdb3" CXXFLAGS="-O0 -ggdb3"

   3.3 make -j8
   3.4 export LIBGL_DRIVERS_PATH=$HOME/mesa_debug_build/lib/gallium
   3.5 export LD_LIBRARY_PATH=$HOME/mesa_debug_build/lib:$HOME/.local/lib

4. Build piglit
   4.1 mkdir ~/piglit_build ; cd ~/piglit_build
   4.2 export PIGLIT_BUILD_DIR=$HOME/piglit_build
   4.3 ccmake $PIGLIT_ROOT -DOPENGL_INCLUDE_DIR=$MESA_ROOT/include
   4.3.1 inside ccmake screen, make sure you have all dependencies installed. If not, you can find all of them in rpmfind website (for rpm based distributions).
   4.3.2 press 'c' twice, then 'g' to generate the config   

   4.4 make -j8

5. Make sure everything is configured
   5.1 glxinfo | grep OpenGL
   5.2 Make sure you see in the results:
       OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 3.8, 128 bits)
       OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.1.0-devel
       OpenGL core profile shading language version string: 3.30

6. Run the test
   6.1 ~/piglit_build/bin/arb_blend_func_extended-fbo-extended-blend -auto
   6.2 The expected output is:
Probe color at (5,5)
  Expected: 0.875000 0.250000 0.500000 0.375000
  Observed: 0.749020 0.125490 0.250980 0.250980
For src/dst 4 3 0
PIGLIT: {"result": "fail" }

Additional tips:

- You can dump the LLVM IR by exporting GALLIVM_DEBUG=ir

- You can dump the generated assembly by exporting GALLIVM_DEBUG=asm

- You can change DEBUG_EXECUTION define in lp_bld_tgsi_soa.c to 1 to dump values from the shader code. However, this makes the problem go away :( Probably because it rearranges the code so the bug disappears.

- llvmpipe driver is located at $MESA_ROOT/src/gallium/drivers/llvmpipe

- Additional auxiliary llvmpipe files are located at $MESA_ROOT/src/gallium/auxiliary/gallivm

- lp_bld_init.c contains the code that chooses which passes we are using (and does other initialization commands)

- From my debugging, the problem appears to be located *after* the loop_exit: label in the LLVM IR file, meaning all values up to that point appears to be correct. I'm also basing this on the fact that vmaxfp is used only after that label. This is 100% certain, but I thought I should mentioned it.

- The fragment shader code is called in 3 places, but if you want to debug it, you can just put breakpoint at lp_rast_shade_tile() at line that starts with:
variant->jit_function[RAST_WHOLE]    -   That's the entry point to the fragment shader jit function

That's about it,

Comment 1 Oded Gabbay 2015-11-12 03:23:04 PST

Created attachment 15273 [details]
Generated assembly for original code, with instcombine and vmaxfp

Comment 2 Oded Gabbay 2015-11-12 03:23:36 PST

Created attachment 15274 [details]
LLVM IR for modified code, without instcombine but with vmaxfp

Comment 3 Oded Gabbay 2015-11-12 03:24:12 PST

Created attachment 15275 [details]
Generated assembly for modified code, without instcombine but with vmaxfp

Comment 4 Oded Gabbay 2015-11-12 03:24:39 PST

Created attachment 15276 [details]
LLVM IR for modified code, with instcombine but without vmaxfp

Comment 5 Oded Gabbay 2015-11-12 03:25:00 PST

Created attachment 15277 [details]
Generated assembly for modified code, with instcombine but without vmaxfp

Comment 6 Oded Gabbay 2015-11-12 03:30:15 PST

I forgot to specify the names of the 4 tests I mentioned in the original post:
- arb_blend_func_extended-fbo-extended-blend (This is what I debugged)
- arb_blend_func_extended-fbo-extended-blend-explicit
- glsl-kwin-blur-2
- glsl-orangebook-ch06-bump

You can see the regressions from x86-64 to ppc64le and how to run the above tests at:

http://people.freedesktop.org/~gabbayo/piglit_results/20151110/x86-64_ppc64le/regressions.html

Comment 7 Oded Gabbay 2015-11-17 08:13:38 PST

I have new information that may shed light on this situation, and I think it explain the other findings.

If I disable the mattr "vsx", than there are no more failures in the piglit test suite! (except from those tests that also fail on x86 of course).

I think that some vmx or vsx commands have not been properly marked as lane-sensitive, and that's what causing the bug in ppc64le.

For instance, when I replace vmaxfp with using xvmaxsp, some of the tests that failed started to pass, while other tests which passed, started to fail.

Please advise.

Oded

Comment 8 Oded Gabbay 2015-11-17 14:58:11 PST

Just to make it absolutely clear:
Even with instCombine enabled, all the tests pass once I disable VSX instruction generation.

I'm inserting a workaround in mesa upstream, until this bug is solved.

Comment 9 Bill Schmidt 2015-11-17 15:13:49 PST

-mno-vsx is a bit of a heavy hammer.  Can you try adding -disable-ppc-vsx-swap-removal and see if that takes care of the problem?

Comment 10 Oded Gabbay 2015-11-18 02:23:48 PST

I couldn't find how to send that option through the LLVM API (because we don't use front-ends), so I just went to LLVM code and commented out the relevant code in PPCPassConfig::addMachineSSAOptimization():

/*
if (TM->getTargetTriple().getArch() == Triple::ppc64le &&
      !DisableVSXSwapRemoval)
    addPass(createPPCVSXSwapRemovalPass());*/

Unfortunately, it didn't help. The failures came back.

I even commented out the entire function, just for good measures, but the tests still failed.

So I'm staying for now with the -mno-vsx, but I hope you guys can find a solution. I really want to convert some of the altivec instructions in mesa to vsx.

Comment 11 Bill Schmidt 2015-11-18 07:41:48 PST

Hm.  One other quick experiment that would be worth trying would be to disable the recently implemented PPCMIPeepholePass, which you can do in the same module.  I kind of doubt that will help, but it's worth a try.

Just to be clear, none of the mesa code is currently using VSX via assembly or intrinsics, correct?  That would be another potential source of problems.  From what you've said and what I recall, only Altivec intrinsics are in use, but I just want to be sure.

Thanks,
Bill

Comment 12 Oded Gabbay 2015-11-18 07:47:57 PST

Hi,
I already tried to disable the peephole optimization you mentioned, but it didn't help.

And, you are correct, mesa doesn't use VSX at all, not intrinsic and definitely not assembly commands. BTW, mesa doesn't write direct assembly at all, just builds the LLVM IR.

Comment 13 Bill Schmidt 2015-11-18 12:21:35 PST

"For instance, when I replace vmaxfp with using xvmaxsp, some of the tests that failed started to pass, while other tests which passed, started to fail."

How did you perform these replacements?  vmaxfp and xvmaxsp use different register files, so the numbers must be adjusted.  For example,

   vmaxfp 0,12,31

is the same as

   xvmaxsp 32,44,63

You probably already know this, but I'm just trying to eliminate any potential red herrings.

Comment 14 Oded Gabbay 2015-11-18 12:47:24 PST

I performed those replacements by choosing the xvmaxsp intrinsic instead of vmaxfp, when building the LLVM IR file. Then, LLVM backend generated the correct assembly instruction.

Please see http://cgit.freedesktop.org/~gabbayo/mesa/commit/?h=mesa-vsx&id=d8490843c2ba5f10f20bec02e2cda00231338a73

All the registers allocations and adjustments are done by LLVM backend, AFAIK.

Oded

Comment 15 Bill Schmidt 2015-11-18 12:59:08 PST

Thanks, Oded, that is very helpful.

Adding Nemanja Ivanovic to CC list for investigation.

Comment 16 Oded Gabbay 2015-11-18 13:56:47 PST

Per Nemanja suggestion, I run the failed tests (there are about 7) three times, each time with a different option (and without -vsx of course). I also run specific test which passes without the need to disable vsx, to check for regressions.

1. Run with -mcpu=pwr7
2. Run with -mattr=-direct-move
3. Run with -mattr=-power8-vector

Variant no. 1 had the most significant effect. It fixed almost all of the tests. From the 7 above tests, only 1 kept failing. (whereas with -vsx, all 7 pass)

Variant no. 2 caused mixed results. About 4 tests pass while 3 failed. It also caused the "good" test to fail (regression). 

Variant no. 3 had minor impact. It only made 1 test of the 7 pass.

Thanks,

    Oded

Comment 17 Oded Gabbay 2015-11-18 14:13:31 PST

Just wanted to list here the 7 tests that fail without -vsx:

$PIGLIT_BUILD/bin/glsl-orangebook-ch06-bump -auto -fbo
$PIGLIT_BUILD/bin/glsl-kwin-blur-2 -auto -fbo
$PIGLIT_BUILD/bin/getteximage-formats -auto
$PIGLIT_BUILD/bin/getteximage-formats init-by-rendering -auto -fbo
$PIGLIT_BUILD/bin/fbo-generatemipmap-formats GL_EXT_texture_sRGB -auto -fbo
$PIGLIT_BUILD/bin/fbo-blending-formats -auto -fbo
$PIGLIT_BUILD/bin/arb_blend_func_extended-fbo-extended-blend -auto

The test I use to check for regression is:

$PIGLIT_BUILD/bin/ext_framebuffer_multisample-accuracy all_samples srgb depthstencil -auto -fbo

btw, there is another test which fails without -vsx, but it takes about 30-40 minutes to run so I usually don't run it:

$PIGLIT_BUILD/bin/gl-1.0-blend-func -auto -fbo


Oded

Comment 18 Nemanja Ivanovic 2015-11-23 05:30:54 PST

I managed to get everything built and reproduce both the successful and failing test case execution.
Now I'm narrowing down the code in LLVM that causes the failure and plan to tackle one test case at a time.
The test case I am working on certainly passes when I disable the scalar <-> vector conversion using direct moves so I am hoping to identify either a bug in that code or how that code is affecting something else.

Comment 19 Oded Gabbay 2015-12-09 02:38:25 PST

Hi Nemanja,
I just wondered if you had a breakthrough.
I saw your patch "[PATCH] D15286: Utilize direct move instructions for bitcast operations between floating point and integral values" and thought it may be related to this bug.

Thanks,

   Oded

Comment 20 Nemanja Ivanovic 2015-12-09 05:30:57 PST

(In reply to comment #19)
> Hi Nemanja,
> I just wondered if you had a breakthrough.
> I saw your patch "[PATCH] D15286: Utilize direct move instructions for
> bitcast operations between floating point and integral values" and thought
> it may be related to this bug.
> 
> Thanks,
> 
>    Oded

Hi Oded,
I have made progress in the investigation but not enough to claim this PR as done.
Namely, I have identified some issues and am going through them trying to fix them. As part of this investigation, I discovered bugs that aren't directly exposed by your test cases, but are related. A couple of patches I put in deal with those. Also, in terms of this but, I am making incremental improvements:
- I've identified two optimizations that I need to fix
- I have a suggested fix for one - this allows some of the test cases to pass
- For the other one, I don't have a full understanding yet for what the actual issue is. I'm working on this one.
In any case, there are a number of things that need to be changed to get everything working and I plan to pick them off one at a time until all the test cases pass.

Comment 21 Oded Gabbay 2016-03-19 15:41:57 PDT

Hi Nemanjai,

I tried the patch you sent me, and the tests above are fixed.
I run the full piglit suite and I found no regression vs. a version without your patch.

btw, I dumped the generated assembly to make sure I'm running VSX commands.

Please let me know if you think we can move ahead with this patch and if it can also be ported back to LLVM 3.8 stable or equivalent.

This will finally allow me to start replacing VSX commands in mesa :)

Thanks,

Oded

Comment 22 Nemanja Ivanovic 2016-03-25 09:08:57 PDT

OK, so the actual proper fix came from a seemingly unrelated bug. All the failing test cases pass with the fix for PR 26775 with no regressions.

As I was narrowing down the issue, it became obvious that this bug was related to the other simpler manifestation of the same bug. When Tom posted a fix for that one, I realized that I'm probably hitting the same issue with this bug (except the sub/super register classes in question were VMX/VSX respectively). I applied the fix, ran all the piglit test cases and confirmed all the test cases are fixed with no regressions.

I will post a comment in the other bug requesting that it be ported to 3.8.

Comment 23 Oded Gabbay 2016-04-26 08:08:42 PDT

Bug is fixed in llvm master branch