Created attachment 15272 [details] LLVM IR for original code, with instcombine and vmaxfp Hi, I'm working on enabling graphics (OpenGL) on ppc64le arch. This is done through the llvmpipe driver in the mesa library. The llvmpipe driver converts TGSI shader code into LLVM IR, by calling LLVM API functions that build the IR, then call LLVM API to compile it as JIT function and then call that function when necessary (multiple times). To test mesa, we use another open source library called piglit, which contains thousands of tests. Currently, I'm working on tests that fail on ppc64le but pass on x86-64 (Haswell). I started with almost 600 such tests and now I'm down to about 10. I now found that out of those 10, 4 tests can succeed if I disable the use of the instcombine pass. I made no other change in mesa, piglit or llvm code. This means that the generated LLVM IR is the same (with or without the instcombine pass), but the generated assembly is different. Unfortunately, the generated assembly of graphic shaders is quite large (just under 3400 assembly commands). That's why it is very difficult for me to compare the generated assembly with this pass and without - the generated assembly is very different. I tried also to produce a "short" standalone IR file and reproduce the problem with that, but to no avail. I have two other findings I'd like to point out: 1. I thought that in the meantime, I could disable this pass in llvmpipe and make those 4 tests pass. Unfrotunately, I found out that even though it fixes those 4 tests, it causes a regression in about 30 other tests - which is very weird and something I find distributing. 2. There is another option to make those 4 tests pass, and that is to disable use of the Altivec vmaxfp intrinsic (and instead use a couple of generic add/sub/shift IR commands). Now, I don't know why, but that fixes those 4 tests. Of course, that intrinsic is used in many many tests (almost all of them), and most of the tests pass, so the problem is not in this Altivec intrinsic itself, but probably in some combination of it and other things. I have attached 6 files: - LLVM IR and generated assembly files for original code, with instcombine and vmaxfp - LLVM IR and generated assembly files for modified code, without instcombine but with vmaxfp - LLVM IR and generated assembly files for modified code, with instcombine but without vmaxfp Those files above were generated by mesa (see below on how to re-create them). Because the bug is in a certain variant of the fragment shader, I removed all the other shader codes from the above files because it is not relevant. Therefore, they are shorter then the original dumps. In addition, I would like to post here the steps to re-create this setup on a POWER8, ppc64le machine. I'm using RHEL 7.2 internal release, but RHEL 7.1LE or Fedora 21/22 ppc64le can be used as well. The important thing is that you need a desktop GUI, such as GNOME, because you need xserver to run. 1. Clone LLVM, mesa and piglit repos: 1.1 git clone http://llvm.org/git/llvm.git 1.2 git clone git://anongit.freedesktop.org/mesa/mesa 1.3 git://anongit.freedesktop.org/git/piglit 1.4 export LLVM_ROOT=<llvm source folder> 1.5 export MESA_ROOT=<mesa source folder> 1.6 export PIGLIT_ROOT=<piglit source folder> 1.7 export LIBGL_ALWAYS_SOFTWARE=1 2. Build LLVM 2.1 mkdir ~/myllvmbuild ; cd ~/myllvmbuild 2.2 $LLVM_ROOT/configure --disable-dependency-tracking --prefix=$HOME/.local --with-extra-ld-options=-Wl,-Bsymbolic,--default-symver --enable-targets=host --enable-bindings=none --enable-debug-runtime --enable-jit --enable-shared --enable-optimized --disable-clang-arcmt --disable-clang-static-analyzer --disable-clang-rewriter --disable-assertions --disable-docs --disable-libffi --disable-terminfo --disable-timestamps 2.3 make -j8 ; make install 3. Build mesa 3.1 mkdir ~/mesa_debug_build ; cd ~/mesa_debug_build 3.2 $MESA_ROOT/autogen.sh --disable-dependency-tracking --prefix=$HOME/.local --enable-selinux --enable-osmesa --with-dri-driverdir=$HOME/.local/lib/dri --enable-egl --disable-gles1 --enable-gles2 --disable-xvmc --disable-dri3 --with-egl-platforms=x11,drm --enable-shared-glapi --enable-gbm --disable-opencl --enable-glx-tls --enable-texture-float=yes --enable-gallium-llvm --enable-llvm-shared-libs --enable-dri --with-gallium-drivers=svga,swrast --with-dri-drivers=swrast --with-llvm-prefix=$HOME/.local --enable-debug CFLAGS="-O0 -ggdb3" CXXFLAGS="-O0 -ggdb3" 3.3 make -j8 3.4 export LIBGL_DRIVERS_PATH=$HOME/mesa_debug_build/lib/gallium 3.5 export LD_LIBRARY_PATH=$HOME/mesa_debug_build/lib:$HOME/.local/lib 4. Build piglit 4.1 mkdir ~/piglit_build ; cd ~/piglit_build 4.2 export PIGLIT_BUILD_DIR=$HOME/piglit_build 4.3 ccmake $PIGLIT_ROOT -DOPENGL_INCLUDE_DIR=$MESA_ROOT/include 4.3.1 inside ccmake screen, make sure you have all dependencies installed. If not, you can find all of them in rpmfind website (for rpm based distributions). 4.3.2 press 'c' twice, then 'g' to generate the config 4.4 make -j8 5. Make sure everything is configured 5.1 glxinfo | grep OpenGL 5.2 Make sure you see in the results: OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 3.8, 128 bits) OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.1.0-devel OpenGL core profile shading language version string: 3.30 6. Run the test 6.1 ~/piglit_build/bin/arb_blend_func_extended-fbo-extended-blend -auto 6.2 The expected output is: Probe color at (5,5) Expected: 0.875000 0.250000 0.500000 0.375000 Observed: 0.749020 0.125490 0.250980 0.250980 For src/dst 4 3 0 PIGLIT: {"result": "fail" } Additional tips: - You can dump the LLVM IR by exporting GALLIVM_DEBUG=ir - You can dump the generated assembly by exporting GALLIVM_DEBUG=asm - You can change DEBUG_EXECUTION define in lp_bld_tgsi_soa.c to 1 to dump values from the shader code. However, this makes the problem go away :( Probably because it rearranges the code so the bug disappears. - llvmpipe driver is located at $MESA_ROOT/src/gallium/drivers/llvmpipe - Additional auxiliary llvmpipe files are located at $MESA_ROOT/src/gallium/auxiliary/gallivm - lp_bld_init.c contains the code that chooses which passes we are using (and does other initialization commands) - From my debugging, the problem appears to be located *after* the loop_exit: label in the LLVM IR file, meaning all values up to that point appears to be correct. I'm also basing this on the fact that vmaxfp is used only after that label. This is 100% certain, but I thought I should mentioned it. - The fragment shader code is called in 3 places, but if you want to debug it, you can just put breakpoint at lp_rast_shade_tile() at line that starts with: variant->jit_function[RAST_WHOLE] - That's the entry point to the fragment shader jit function That's about it,
Created attachment 15273 [details] Generated assembly for original code, with instcombine and vmaxfp
Created attachment 15274 [details] LLVM IR for modified code, without instcombine but with vmaxfp
Created attachment 15275 [details] Generated assembly for modified code, without instcombine but with vmaxfp
Created attachment 15276 [details] LLVM IR for modified code, with instcombine but without vmaxfp
Created attachment 15277 [details] Generated assembly for modified code, with instcombine but without vmaxfp
I forgot to specify the names of the 4 tests I mentioned in the original post: - arb_blend_func_extended-fbo-extended-blend (This is what I debugged) - arb_blend_func_extended-fbo-extended-blend-explicit - glsl-kwin-blur-2 - glsl-orangebook-ch06-bump You can see the regressions from x86-64 to ppc64le and how to run the above tests at: http://people.freedesktop.org/~gabbayo/piglit_results/20151110/x86-64_ppc64le/regressions.html
I have new information that may shed light on this situation, and I think it explain the other findings. If I disable the mattr "vsx", than there are no more failures in the piglit test suite! (except from those tests that also fail on x86 of course). I think that some vmx or vsx commands have not been properly marked as lane-sensitive, and that's what causing the bug in ppc64le. For instance, when I replace vmaxfp with using xvmaxsp, some of the tests that failed started to pass, while other tests which passed, started to fail. Please advise. Oded
Just to make it absolutely clear: Even with instCombine enabled, all the tests pass once I disable VSX instruction generation. I'm inserting a workaround in mesa upstream, until this bug is solved.
-mno-vsx is a bit of a heavy hammer. Can you try adding -disable-ppc-vsx-swap-removal and see if that takes care of the problem?
I couldn't find how to send that option through the LLVM API (because we don't use front-ends), so I just went to LLVM code and commented out the relevant code in PPCPassConfig::addMachineSSAOptimization(): /* if (TM->getTargetTriple().getArch() == Triple::ppc64le && !DisableVSXSwapRemoval) addPass(createPPCVSXSwapRemovalPass());*/ Unfortunately, it didn't help. The failures came back. I even commented out the entire function, just for good measures, but the tests still failed. So I'm staying for now with the -mno-vsx, but I hope you guys can find a solution. I really want to convert some of the altivec instructions in mesa to vsx.
Hm. One other quick experiment that would be worth trying would be to disable the recently implemented PPCMIPeepholePass, which you can do in the same module. I kind of doubt that will help, but it's worth a try. Just to be clear, none of the mesa code is currently using VSX via assembly or intrinsics, correct? That would be another potential source of problems. From what you've said and what I recall, only Altivec intrinsics are in use, but I just want to be sure. Thanks, Bill
Hi, I already tried to disable the peephole optimization you mentioned, but it didn't help. And, you are correct, mesa doesn't use VSX at all, not intrinsic and definitely not assembly commands. BTW, mesa doesn't write direct assembly at all, just builds the LLVM IR.
"For instance, when I replace vmaxfp with using xvmaxsp, some of the tests that failed started to pass, while other tests which passed, started to fail." How did you perform these replacements? vmaxfp and xvmaxsp use different register files, so the numbers must be adjusted. For example, vmaxfp 0,12,31 is the same as xvmaxsp 32,44,63 You probably already know this, but I'm just trying to eliminate any potential red herrings.
I performed those replacements by choosing the xvmaxsp intrinsic instead of vmaxfp, when building the LLVM IR file. Then, LLVM backend generated the correct assembly instruction. Please see http://cgit.freedesktop.org/~gabbayo/mesa/commit/?h=mesa-vsx&id=d8490843c2ba5f10f20bec02e2cda00231338a73 All the registers allocations and adjustments are done by LLVM backend, AFAIK. Oded
Thanks, Oded, that is very helpful. Adding Nemanja Ivanovic to CC list for investigation.
Per Nemanja suggestion, I run the failed tests (there are about 7) three times, each time with a different option (and without -vsx of course). I also run specific test which passes without the need to disable vsx, to check for regressions. 1. Run with -mcpu=pwr7 2. Run with -mattr=-direct-move 3. Run with -mattr=-power8-vector Variant no. 1 had the most significant effect. It fixed almost all of the tests. From the 7 above tests, only 1 kept failing. (whereas with -vsx, all 7 pass) Variant no. 2 caused mixed results. About 4 tests pass while 3 failed. It also caused the "good" test to fail (regression). Variant no. 3 had minor impact. It only made 1 test of the 7 pass. Thanks, Oded
Just wanted to list here the 7 tests that fail without -vsx: $PIGLIT_BUILD/bin/glsl-orangebook-ch06-bump -auto -fbo $PIGLIT_BUILD/bin/glsl-kwin-blur-2 -auto -fbo $PIGLIT_BUILD/bin/getteximage-formats -auto $PIGLIT_BUILD/bin/getteximage-formats init-by-rendering -auto -fbo $PIGLIT_BUILD/bin/fbo-generatemipmap-formats GL_EXT_texture_sRGB -auto -fbo $PIGLIT_BUILD/bin/fbo-blending-formats -auto -fbo $PIGLIT_BUILD/bin/arb_blend_func_extended-fbo-extended-blend -auto The test I use to check for regression is: $PIGLIT_BUILD/bin/ext_framebuffer_multisample-accuracy all_samples srgb depthstencil -auto -fbo btw, there is another test which fails without -vsx, but it takes about 30-40 minutes to run so I usually don't run it: $PIGLIT_BUILD/bin/gl-1.0-blend-func -auto -fbo Oded
I managed to get everything built and reproduce both the successful and failing test case execution. Now I'm narrowing down the code in LLVM that causes the failure and plan to tackle one test case at a time. The test case I am working on certainly passes when I disable the scalar <-> vector conversion using direct moves so I am hoping to identify either a bug in that code or how that code is affecting something else.
Hi Nemanja, I just wondered if you had a breakthrough. I saw your patch "[PATCH] D15286: Utilize direct move instructions for bitcast operations between floating point and integral values" and thought it may be related to this bug. Thanks, Oded
(In reply to comment #19) > Hi Nemanja, > I just wondered if you had a breakthrough. > I saw your patch "[PATCH] D15286: Utilize direct move instructions for > bitcast operations between floating point and integral values" and thought > it may be related to this bug. > > Thanks, > > Oded Hi Oded, I have made progress in the investigation but not enough to claim this PR as done. Namely, I have identified some issues and am going through them trying to fix them. As part of this investigation, I discovered bugs that aren't directly exposed by your test cases, but are related. A couple of patches I put in deal with those. Also, in terms of this but, I am making incremental improvements: - I've identified two optimizations that I need to fix - I have a suggested fix for one - this allows some of the test cases to pass - For the other one, I don't have a full understanding yet for what the actual issue is. I'm working on this one. In any case, there are a number of things that need to be changed to get everything working and I plan to pick them off one at a time until all the test cases pass.
Hi Nemanjai, I tried the patch you sent me, and the tests above are fixed. I run the full piglit suite and I found no regression vs. a version without your patch. btw, I dumped the generated assembly to make sure I'm running VSX commands. Please let me know if you think we can move ahead with this patch and if it can also be ported back to LLVM 3.8 stable or equivalent. This will finally allow me to start replacing VSX commands in mesa :) Thanks, Oded
OK, so the actual proper fix came from a seemingly unrelated bug. All the failing test cases pass with the fix for PR 26775 with no regressions. As I was narrowing down the issue, it became obvious that this bug was related to the other simpler manifestation of the same bug. When Tom posted a fix for that one, I realized that I'm probably hitting the same issue with this bug (except the sub/super register classes in question were VMX/VSX respectively). I applied the fix, ran all the piglit test cases and confirmed all the test cases are fixed with no regressions. I will post a comment in the other bug requesting that it be ported to 3.8.
Bug is fixed in llvm master branch