This is an archive of the discontinued LLVM Phabricator instance.

Add support for artificial tail call frames
ClosedPublic

Authored by vsk on Aug 8 2018, 2:14 PM.

Details

Summary

This patch teaches lldb to detect when there are missing frames in a
backtrace due to a sequence of tail calls, and to fill in the backtrace
with artificial tail call frames when this happens. This is only done
when the execution history can be determined from the call graph and
from the return PC addresses of calls on the stack. Ambiguous sequences
of tail calls (e.g anything involving tail calls and recursion) are
detected and ignored.

Depends on D49887.

Diff Detail

Repository
rLLDB LLDB

Event Timeline

vsk created this revision.Aug 8 2018, 2:14 PM
vsk edited the summary of this revision. (Show Details)Aug 8 2018, 2:15 PM
vsk updated this revision to Diff 160224.Aug 10 2018, 7:00 PM

Rebase, and update the patch to use DW_AT_call_return_pc information.

vsk updated this revision to Diff 160666.Aug 14 2018, 12:24 PM
vsk retitled this revision from WIP: Basic tail call frame support to Add support for artificial tail call frames.
vsk edited the summary of this revision. (Show Details)
vsk removed subscribers: JDevlieghere, jingham, probinson, aprantl.

Interesting topic and very well-written patch. Semantics look reasonable, but can't say much about the details. Added a few minor notes inline.

lldb/include/lldb/Symbol/Function.h
456 ↗(On Diff #160666)

Typo: tail-calling

lldb/packages/Python/lldbsuite/test/decorators.py
691 ↗(On Diff #160666)

Two leading "" missing?

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
3757 ↗(On Diff #160666)

Does any of the tests fail in case LLVM DWARFs change? Or is it very unlikely?

lldb/source/Symbol/Function.cpp
149 ↗(On Diff #160666)

Easy to confuse with lambda assignment (although it wouldn't make any sense).

auto resolve = [&]() -> Function * {
   ...
};
lazy_callee.def = resolve();
resolved = true;

Maybe?

aprantl added inline comments.Aug 24 2018, 10:12 AM
lldb/include/lldb/Symbol/Function.h
331 ↗(On Diff #160666)

llvm::PointerUnion ?

vsk updated this revision to Diff 162416.Aug 24 2018, 10:39 AM
vsk marked 4 inline comments as done.

Thanks for the feedback!

lldb/include/lldb/Symbol/Function.h
331 ↗(On Diff #160666)

It's not possible to use PointerUnion here because const char * has 1-byte alignment. There's no space in the pointer to store a discriminator bit.

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
3757 ↗(On Diff #160666)

Yes, the tests in this patch would fail if TAG_call_site entries were nested within the nearest enclosing lexical block (as specified by DWARF 5). We would have a heads-up about such a breaking change. That said, it's unlikely to catch us by surprise, because we control the LLVM DWARF producer (see https://reviews.llvm.org/D49887).

aprantl added inline comments.Aug 24 2018, 10:50 AM
lldb/include/lldb/Symbol/Function.h
331 ↗(On Diff #160666)

Good point!

vsk updated this revision to Diff 166378.Sep 20 2018, 4:04 PM

I've added SB API support (SBFrame::IsArtificial), a SB API test, fleshed out the remaining tests, and rebased. PTAL, thanks!

Can you add a test that makes sure that when you stop in a frame that has artificial frames above it, and then you do "finish", or "step out" past the end of frame 0, the presence of the artificial frame doesn't confuse us? I am pretty sure that will just work, but it would be good to make sure that it actually does.

Really "step out past the end of the frame" isn't really necessary, since that doesn't pay any attention to the frames above it, it just runs the code till the frame ID changes. But "finish" does look at the parent frame, so it might get confused.

vsk planned changes to this revision.Sep 20 2018, 5:10 PM

Can you add a test that makes sure that when you stop in a frame that has artificial frames above it, and then you do "finish", or "step out" past the end of frame 0, the presence of the artificial frame doesn't confuse us? I am pretty sure that will just work, but it would be good to make sure that it actually does.

Really "step out past the end of the frame" isn't really necessary, since that doesn't pay any attention to the frames above it, it just runs the code till the frame ID changes. But "finish" does look at the parent frame, so it might get confused.

Great suggestion. The 'finish' command doesn't appear to understand artificial frames, and allows program execution to continue past the point where a tail call is made. I'll try and teach it to "look past" artificial frames.

vsk updated this revision to Diff 166394.Sep 20 2018, 6:42 PM

Teach SBThread::StepOut and SBThread::ReturnFromFrame to behave as-if artificial frames were not present.

This preserves the current behavior of "finish" and "thread return". The alternatives -- stepping out into an artificial frame or issuing errors -- don't appear to be better. The newly-added thread_step_out_or_return test covers this.

vsk updated this revision to Diff 166534.Sep 21 2018, 12:14 PM

As discussed offline, print out a note when stepping out of a frame with artificial ancestors explaining that they were skipped while stepping out. See the added test: functionalities/tail_call_frames/thread_step_out_message.

vsk planned changes to this revision.Sep 21 2018, 5:00 PM

While testing this out, I found an issue with CallEdge::GetReturnPCAddress. Getting the load address there adds an unnecessary slide to the PC. I'll try to have that worked out next week.

vsk updated this revision to Diff 167038.Sep 25 2018, 7:22 PM

The bug I thought I saw in CallEdge::GetReturnPCAddress turned out to be an issue I introduced in dsymutil. It's not correct for dsymutil to relocate the return PC value in TAG_call_site entries, because the correct section slide offset can't be applied until we have a running process (and things like dylibs are mapped in).

Other changes:

  • Add a test which requires both inline frame and tail call frame support.
  • Require a call edge’s return PC attribute to match the register context’s PC exactly. Otherwise, we report incorrect frames.
  • Improve logging and add more tests.
aprantl added inline comments.Sep 26 2018, 8:34 AM
lldb/source/Target/StackFrameList.cpp
331 ↗(On Diff #167038)

On what path can this happen? Aren't all paths that set ambiguous=true returning immediately?

404 ↗(On Diff #167038)

auto synth_frame = llvm::make_unique<StackFrame>(..) ?

vsk updated this revision to Diff 167148.Sep 26 2018, 8:42 AM
vsk marked an inline comment as done.
  • Use std::make_shared<StackFrame>(...), instead of StackFrameSP(new ...).
lldb/source/Target/StackFrameList.cpp
331 ↗(On Diff #167038)

If a recursive call to dfs() sets ambiguous = true and returns early, we also want its caller to return early instead of visiting any additional edges. This is a small optimization: the code is correct without the early return.

vsk added a comment.Oct 2 2018, 12:09 PM

Friendly ping.

This looks pretty good! I have one last question about CallEdge inside.

lldb/include/lldb/Symbol/Function.h
304 ↗(On Diff #167148)

Does this also work for C functions? If yes, would symbol_name be a more accurate description?

Is this pointer globally unique in the program, or can two mangled names appear in a legal C program that don't refer to the same function?

lldb/include/lldb/Target/StackFrame.h
65 ↗(On Diff #167148)

e.g.,

lldb/packages/Python/lldbsuite/test/functionalities/tail_call_frames/ambiguous_tail_call_seq2/TestAmbiguousTailCallSeq2.py
5 ↗(On Diff #167148)

This test should be restricted to only run with clang as the compiler or the -glldb flag won't work.

I see. It is implicit in in skipUnlessHasCallSiteInfo — perhaps we should just generally add -glldb to CFLAGS? It's not a bug deal.

vsk updated this revision to Diff 168155.Oct 3 2018, 12:18 PM
vsk marked an inline comment as done.
  • Address feedback from Adrian.
lldb/include/lldb/Symbol/Function.h
304 ↗(On Diff #167148)

Yes, C functions are handled. I'll use "symbol_name", hopefully that makes it clearer.

304 ↗(On Diff #167148)

Great question. No, a symbol name is not necessarily globally unique. Prior to C11, the one-definition-rule doesn't apply. Even with the ODR, a private symbol in a shared library may have the same name as a strong definition in the main executable.

I haven't tried to disambiguate ODR conflicts in this patch. What happens when a conflict occurs is: FindFunctionSymbols prioritizes the symbol in the main executable, and if the call edge's return PC doesn't exactly match what's in the register state we decline to create any artificial frames. I.e. in the presence of ODR conflicts, we only present artificial frames when we're 100% sure they are accurate.

Handling ODR conflicts is a 'to do', though. I'll make an explicit note of that here.

lldb/packages/Python/lldbsuite/test/functionalities/tail_call_frames/ambiguous_tail_call_seq2/TestAmbiguousTailCallSeq2.py
5 ↗(On Diff #167148)

Yes, skipUnlessHasCallSiteInfo requires clang for now. I don't think we can add -glldb to a higher-level CFLAGS variable without breaking compiling with gcc?

aprantl added inline comments.Oct 3 2018, 12:43 PM
lldb/include/lldb/Symbol/Function.h
304 ↗(On Diff #167148)

Thanks! It might be interesting to grep through an XNU dsym to see just how common conflicts are in typical C code.

vsk added inline comments.Oct 3 2018, 2:18 PM
lldb/include/lldb/Symbol/Function.h
304 ↗(On Diff #167148)

Of a total of ~27,000 function names in xnu, 140 names were duplicates. In every case I spot-checked, a subprogram with a duplicate AT_name had a unique AT_linkage_name. So, well under 0.5% in that project.

aprantl accepted this revision.Oct 3 2018, 2:23 PM

Thanks!

This revision is now accepted and ready to land.Oct 3 2018, 2:23 PM
This revision was automatically updated to reflect the committed changes.

Unfortunately, the bots are broken because of the FileCheck issue, so I can't confirm with them, but I see a number of these tests fail in our local testing. Some fail on both Windows and Linux and some just fail on Linux. Here are the failing tests:

Linux:
lldb-Suite :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py
lldb-Suite :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py
lldb-Suite :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py
lldb-Suite :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py
lldb-Suite :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py
lldb-Suite :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py
lldb-Suite :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py

Windows:
lldb-Suite :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py
lldb-Suite :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py

Let me know what you need to investigate.

vsk added a comment.Oct 11 2018, 2:57 PM

Unfortunately, the bots are broken because of the FileCheck issue, so I can't confirm with them, but I see a number of these tests fail in our local testing. Some fail on both Windows and Linux and some just fail on Linux. Here are the failing tests:

Linux:
lldb-Suite :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py
lldb-Suite :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py
lldb-Suite :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py
lldb-Suite :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py
lldb-Suite :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py
lldb-Suite :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py
lldb-Suite :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py

Windows:
lldb-Suite :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py
lldb-Suite :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py

Let me know what you need to investigate.

Strange, I didn't get any bot failure notifications in the days after this landed. Could you share the output from the failing tests?

zturner added a subscriber: vsk.Oct 11 2018, 2:59 PM

See the other email thread. The bots have been broken since September, all
of them complain about missing FileCheck executable

https://reviews.llvm.org/D53002

Is the thread I'm referring to.

In D50478#1262717, @vsk wrote:

Unfortunately, the bots are broken because of the FileCheck issue, so I can't confirm with them, but I see a number of these tests fail in our local testing. Some fail on both Windows and Linux and some just fail on Linux. Here are the failing tests:

Linux:
lldb-Suite :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py
lldb-Suite :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py
lldb-Suite :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py
lldb-Suite :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py
lldb-Suite :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py
lldb-Suite :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py
lldb-Suite :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py

Windows:
lldb-Suite :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py
lldb-Suite :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py

Let me know what you need to investigate.

Strange, I didn't get any bot failure notifications in the days after this landed. Could you share the output from the failing tests?

All the failures on Windows are happening when validating the function name. For example:

======================================================================

FAIL: test_tail_call_frame_sbapi (TestTailCallFrameSBAPI.TestTailCallFrameSBAPI)

----------------------------------------------------------------------

Traceback (most recent call last):

  File "E:\_work\55\s\llvm\tools\lldb\packages\Python\lldbsuite\test\functionalities\tail_call_frames\sbapi_support\TestTailCallFrameSBAPI.py", line 19, in test_tail_call_frame_sbapi

    self.do_test()

  File "E:\_work\55\s\llvm\tools\lldb\packages\Python\lldbsuite\test\functionalities\tail_call_frames\sbapi_support\TestTailCallFrameSBAPI.py", line 64, in do_test

    self.assertTrue(frame.GetDisplayFunctionName() == name)

AssertionError: False is not True

Config=x86_64-E:\_work\55\b\LLVMBuild\Release\bin\clang.exe

----------------------------------------------------------------------

There are several different failures on Linux. Here's the first one:

FAIL: LLDB (/vstsdrive/_work/38/b/LLVMBuild/bin/clang-8-x86_64) :: test_dwarf (lldbsuite.test.lldbtest.TestDisambiguateCallSite)

--- FileCheck trace (code=1) ---
/vstsdrive/_work/38/b/LLVMBuild/bin/FileCheck /vstsdrive/_work/38/s/llvm/tools/lldb/packages/Python/lldbsuite/test/functionalities/tail_call_frames/disambiguate_call_site/main.cpp -implicit-check-not=artificial

FileCheck input:
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
  * frame #0: 0x0000000000400580 a.out`sink() at main.cpp:13:4 [opt]
    frame #1: 0x00000000004005b8 a.out`main(argc=1, (null)=<unavailable>) at main.cpp:28:3 [opt]
    frame #2: 0x00007f980aff7830 libc.so.6`__libc_start_main + 240
    frame #3: 0x00000000004004a9 a.out`_start + 41


FileCheck output:

/vstsdrive/_work/38/s/llvm/tools/lldb/packages/Python/lldbsuite/test/functionalities/tail_call_frames/disambiguate_call_site/main.cpp:15:17: error: CHECK-NEXT: expected string not found in input
 // CHECK-NEXT: func2{{.*}} [opt] [artificial]
                ^
<stdin>:3:2: note: scanning from here
 frame #1: 0x00000000004005b8 a.out`main(argc=1, (null)=<unavailable>) at main.cpp:28:3 [opt]
 ^
<stdin>:3:80: note: possible intended match here
 frame #1: 0x00000000004005b8 a.out`main(argc=1, (null)=<unavailable>) at main.cpp:28:3 [opt]
                                                                              ^

Let me know if you need more logs.