New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for incremental scan results #198
Comments
As per TC#20, this feedback potentially entails technical changes and requires further discussion. SARIF currently provides 'baselienState' and other properties related to diffing two complete SARIF log files. We should explore what it means both to perform an incremental scan and how to merge incremental scan results into a base SARIF file. We have not yet identified a potential CSD.2 change on this topic. |
By incremental scan, I meant an analysis run that is not a complete full re-run of a previous scan, but rather a run that involves re-analyzing a piece of a project that was scanned before. For example, let's assume we have a project Foo. It gets analyzed for the first time, and a full scan of the project is performed generating a full set of analysis results. This is what I would consider a 'baselineState'. Now, let's say I fixed a bunch of bugs and decided to do a complete re-scan of the project generating another full set of results 'rescanState'. The next day though, I only fixed one bug and decided to only re-analyze that part of the project that is affected by my bug fix generating a subset of results 'incrementalState'. The difference between 'rescanState' and 'incrementalState' is that in the first case the entire project was re-analyzed, while in the second -- only part of the project was analyzed. This means that the way we would merge results with 'baselineState' would be different: in the first case, all of the results would be merged, while in the second only the results in those areas of the project that were re-analyzed would be merged. |
Fortify has been implementing incremental analysis incrementally :) Support for two analyzers has been available for a couple of years (since 16.2 release). Fortify's incremental scan will generate a results file that contains all of the issues, as if they were merged between two full scans. Each issue should contain a field for recording which scans it was found in, so that one could merge a chain of results files and know exactly when each issue appeared and disappeared. |
In TC #21, we discussed adding additional properties that would make it easier to determine whether two issues detected in successive runs are in fact logically the same. I don't see how these properties support incremental scanning per se, but I record the ideas here anyway: SARIF already defines the following relevant properties:
SARIF could define the following additional properties, to ease the task of deciding whether two results are logically identical:
These properties would allow both physical and logical renames to be tracked from run to run. As we discussed in the TC, an analysis tool would be unlikely to populate them, but a post-processor that understood both the version control system (for physical renames) and the programming language (for logical renames) might be able to deduce them. |
I don't think SARIF should keep track of the changes to the source code, but what I think would be useful (I mentioned it in the last two minutes of TC 21) is having a property on every result that lists the scans this result was found in. That way a post-processor has enough information to figure out in which scan each result has been detected for the first time, and then in which it went away. |
@katrinaoneil solely scan instance tagging makes sense to me 👍.
|
@katrinaoneil Could you clarify how an array of "runs in which this result occurred" would help in an "incremental scan" scenario? |
@sdrees Sorry, I don't think I fully understand your question -- perhaps, we can discuss during the next call. What do you mean by a "field"? If |
@lgolding I guess, it all depends on what we mean by "support for incremental scan" in the context of SARIF. To be honest, I was not sure what it should mean when I filed the issue, but the more we talk about it, the more I get convinced that SARIF should not be doing any tracking of source code changes (the job of a source control system), figuring out what to scan provided the source changes made (the job of the analysis engine) or attempting to map new/removed/existing issues to each other (the job of the post-processor). However, the format should allow the results to contain enough information for the post-processor to be able to differentiate between full scans and incremental scans. And if each result in SARIF contains an array of scan ids it was detected in, this would be possible. |
@katrinaoneil sure, but the 2. point boils down to historicize or not - if per finding only the latest scan id is noted 👍 no problem. Appending scan ids per finding would not look sound to me. |
@katrinaoneil I agree with @michaelcfanning that a full set of scan ids might be overdoing it. I think we hit the sweet spot with |
as per offline discussion with @katrinaoneil, the important immediate goal with incremental scanning is to properly record time of last detection (which is tracked in #287). as more specific proposals for incremental scanning scenario emerge, we will file new issues to address. |
@lgolding, @michaelcfanning, agreed |
The spec does not currently support incremental scan results (delta results)
The text was updated successfully, but these errors were encountered: