Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore threadFlowLocation.kind #202

Closed
ghost opened this issue Jul 17, 2018 · 7 comments
Closed

Restore threadFlowLocation.kind #202

ghost opened this issue Jul 17, 2018 · 7 comments
Labels
2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. code-flows design-approved The TC approved the design and I can write the change draft design-improvement e-ballot-3 impact-non-breaking-change merged Changes merged into provisional draft. p1 Priority 1 issue to close resolved-fixed tc-34

Comments

@ghost
Copy link

ghost commented Jul 17, 2018

We had previously removed annotatedCodeLocation.kind, along with the four "kind-dependent" properties target, targetLocation, values, and state. We propose to partially reverse that decision:

  • Restore threadFlowLocation.kind, giving it (to start with) the same set of values as the old annotatedCodeLocation.kind.
  • Add additional values to describe exceptions (throw, catch, finally, others? e.g. rethrow?)
  • Clarify that other values are allowed.
  • Do not restore the kind-dependent properties.

@michaelcfanning FYI

@ghost ghost added 2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. design-improvement labels Jul 17, 2018
ghost pushed a commit that referenced this issue Aug 15, 2018
ghost pushed a commit that referenced this issue Aug 15, 2018
@ghost ghost self-assigned this Sep 24, 2018
ghost pushed a commit that referenced this issue Sep 24, 2018
ghost pushed a commit that referenced this issue Oct 3, 2018
@fishoak
Copy link

fishoak commented Oct 3, 2018

I have some general observations about threadFlowLocation.kind and some specific things. I'll start with the general stuff.

My impression is that this is threatening to get very unwieldy. I postulate that it will be impossible to specify a set of kinds in this list that is going to satisfy all parties. I would urge that we find the bare minimum set of kinds and stick with those.

Right now the kinds are a mix of three things: the kind of node in the control flow, the form of data access that occurs, and "declaration" which is neither of the previous two. The only one I have a strong reaction to is that last one. It's OK to have a declaration in the flow if it is also associated with something that is implicitly executed, but then the right thing would be for it to be counted as whatever that executable thing is (e.g, an assignment or a constructor). In C, "int x;" has no executable content and so IMHO should not appear in the flow. The proposed "endScope" kind feels a bit like this too.

The data access kinds "assignment", "usage", and the proposed "alias", "passthrough", and "sanitizer" are starting to feel like overkill.

We now have several ways in which something can be called/entered. There is "functionEnter", but also "applicationEntryPoint" is being proposed, and "catch" is kind of similar in that it is an entry to the body of an exception handler. There are others that one might want, such as a signal handler. In that vein, should the entry point to a thread count as functionEnter? Also, a colleague pointed out that it is often useful to distinguish between the kinds of method being called (constructor, destructor, copy-constructor, method, static method, etc...).

To avoid this, it might be worth considering whether we want to simply have a generic "enter" and "exit", and to rely on the fact that one can have additional kinds in the list, e.g.: ["enter", "function"] or ["exit", "handler"]. This would work for scopes too: ["exit", "scope"] would replace the proposed "endScope".

"continuation" might be problematic. First, that's an overloaded term in PL so it will invite confusion. Second, if you're going to have it, then you will also want a "jump". Plus, it's just the next place in the flow, and I don't see a super good reason for distinguishing either jumps or this continuation from other nodes in the flow.

In the "callReturn" note, I just want to confirm that "terminates" does not mean that the execution terminates but that the particular flow that is encoded has come to an end, possibly because the analyzer has concluded it isn't worth showing it.

ghost pushed a commit that referenced this issue Oct 4, 2018
@ghost
Copy link
Author

ghost commented Oct 4, 2018

I agree that this could get unwieldy. I've tried to incorporate everyone's favorite kind values. Let's just stop here, shall we? :-)

I don't feel strongly about declaration -- anybody else have an opinion on that?

I don't really agree with the idea that assignment, usage, alias, passthrough, and sanitizer are overkill if your goal is to track tainted data. "a" is tainted, but we assigned it to "b"; now "b" is tainted. "c" is now an alias for "b", so "c" is tainted. But "c" was sanitized, so we're good.

usage goes as far back as SARIF v1, and I don't remember the justification -- Michael?

Yekaterina explicitly asked for passthrough and endScope, arguing as follows:

Yes, “passthrough” and “endScope” would be useful to us. The former because we differentiate between just usage (e.g. variable was assigned null and then used, that is, dereferenced) and propagation of taint. The latter is useful for explaining why we report memory and resource leaks.

continuation is a request from the Microsoft Static Driver Verifier team; they use it in their native code flows. What is "PL"? In any case, we do try to avoid overloading terms, but there are only so many words in the language. :-) Do you have an alternative?

I suppose we ought to add jump (even though we already have branch) in case someone uses a goto.

The idea of factoring "enter"/"exit" out from "application", "function", "handler", etc. is intriguing. Michael, what do you think?

Yes, "terminate" would refer to the end of the code flow, not the end of program execution. HOWEVER, please look at the latest change draft. That word no longer appears; I've rewritten the notes in this section.

@michaelcfanning
Copy link
Contributor

I think 'usage' is likely something that doesn't have utility for the specific scenario (icon/other kind-specific visualizations) we have in mind. If 'endScope' is interesting then some notion of resource/memory allocation/acquisition may be interesting.

Yes, I agree that simplifying enter/exit may be useful.

@ghost
Copy link
Author

ghost commented Oct 4, 2018

Actually I take it back about needing jump. A jump is just a branch without a condition, if you take "branch" to mean "execute an instruction other than the next one in order". I recall assembly languages where you had BRZ (branch if zero), BRN (branch if non-zero), and just plain BR (unconditional branch). Or JZ, JNZ, and just plain J if the assembler authors liked "jump" better than "branch".

In any case, I don't think we need both.

@ghost ghost added the discussion-ongoing label Oct 8, 2018
@fishoak
Copy link

fishoak commented Oct 10, 2018

I don't really agree with the idea that assignment, usage, alias, passthrough, and sanitizer are overkill if your goal is to track tainted data. "a" is tainted, but we assigned it to "b"; now "b" is tainted. "c" is now an alias for "b", so "c" is tainted. But "c" was sanitized, so we're good.

These terms have a particular meaning for a form of taint analysis, and they do not necessarily make sense for all kinds of taint analysis. It's fine to have these in the standard, but we need to make it clear that they are very loosely defined and likely incomplete.

continuation is a request from the Microsoft Static Driver Verifier team; they use it in their native code flows. What is "PL"? In any case, we do try to avoid overloading terms, but there are only so many words in the language. :-) Do you have an alternative?

By "PL" I mean Programming Languages. See https://en.wikipedia.org/wiki/Continuation. I can't find a good alternative wording, and I don't feel strongly about this, so maybe it's fine.

From a graph-theoretic perspective, there's no difference between control passing to the target of an unconditional jump, and control just passing to the next statement in the normal flow. Similarly, the target that immediately follows a loop control predicate evaluating to false is not all that different from the false target of an if statement. As long as the "continuation" kind is not required to be present that's fine. Perhaps the text should say something like this: "These kinds are primarily intended to help a human reader understand the flow of control through the thread, and should not be interpreted as a formal description of the semantics of the path."

Yes, "terminate" would refer to the end of the code flow, not the end of program execution. HOWEVER, please look at the latest change draft. That word no longer appears; I've rewritten the notes in this section.

OK. that change looks great.

@michaelcfanning michaelcfanning added the p1 Priority 1 issue to close label Jan 24, 2019
@michaelcfanning
Copy link
Contributor

michaelcfanning commented Jan 25, 2019

E-BALLOT #3 PROPOSAL

Restore threadFlowLocation.kind, but make it an array "kinds". Make it an open-ended string, but recommend the set of values approved by the TC at the F2F meeting.

SCHEMA CHANGES
  • In the threadFlowLocation object:

    • Remove the kind property (in case it was still around).
    • Add a property kinds of type string[], optional, minItems: 0, default: []

The recommended values are:

acquire
release
enter
exit
call
return
branch

implicit
false
true
caution
danger
unknown
unreachable

taint
function
handler
lock
memory
resource
scope

@michaelcfanning michaelcfanning added design-approved The TC approved the design and I can write the change draft and removed discussion-ongoing labels Jan 25, 2019
@ghost ghost added the e-ballot-3 label Mar 18, 2019
ghost pushed a commit that referenced this issue Mar 26, 2019
@ghost ghost added code-flows change-draft-available merged Changes merged into provisional draft. tc-34 labels Mar 26, 2019
ghost pushed a commit that referenced this issue Mar 28, 2019
@ghost
Copy link
Author

ghost commented Apr 6, 2019

Approved in e-ballot-3.

@ghost ghost closed this as completed Apr 6, 2019
@ghost ghost mentioned this issue Apr 16, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. code-flows design-approved The TC approved the design and I can write the change draft design-improvement e-ballot-3 impact-non-breaking-change merged Changes merged into provisional draft. p1 Priority 1 issue to close resolved-fixed tc-34
Projects
None yet
Development

No branches or pull requests

2 participants