Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version control details not strongly associated with results #248

Closed
michaelcfanning opened this issue Sep 17, 2018 · 8 comments
Closed
Labels
2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. design-approved The TC approved the design and I can write the change draft enhancement impact-non-breaking-change resolved-fixed triage-approved

Comments

@michaelcfanning
Copy link
Contributor

The version control details object tells us about the VS server, revision, etc., but doesn't contain an enlistment root location. As a result, we can't use this information to do things like rewrite URLs to point to hosted content.

@michaelcfanning
Copy link
Contributor Author

@lgolding, scenario is that we've enlisted in multiple repositories for a single scan run. solution here might be to create some linkage between version control details and uri base ids.

@ghost ghost self-assigned this Oct 23, 2018
@ghost ghost added enhancement impact-non-breaking-change 2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. to-be-written labels Oct 23, 2018
@ghost
Copy link

ghost commented Oct 23, 2018

@michaelcfanning

Proposed design:

  • Add a uriBaseId property to versionControlDetails. Its value must be a "top-level" uriBaseId in the sense that its corresponding fileLocation object in the run.originalUriBaseIds dictionary must have a null value for its uriBaseId property.
  • Now if a consumer looks at a fileLocation with a uriBaseId property, and wants to know which version of which repo the file came from, the consumer proceeds as follows:
  1. Look up fileLocation.uriBaseId in run.originalUriBaseIds.
  2. If necessary, walk up the chain of originalUriBaseIds until you come to the top-level uriBaseId.
  3. In the run.versionControlProvenance array, find the versionControlDetails object whose uriBaseId property matches that top-level uriBaseId.

Example: Suppose the run object contains the following originalUriBaseIds and versionControlProvenance properties:

{
  "originalUriBaseIds": {
    "SDK_SOURCE_ROOT": {
      "uri": "src/",
      "uriBaseId": "SDK_ROOT"
    },
    "SDK_ROOT": {
      "uri": "file:///C:/sdk/"
    },
    "VIEWER_SOURCE_ROOT": {
      "uri": "src/",
      "uriBaseId": "VIEWER_ROOT"
    },
    "VIEWER_ROOT": {
      "uri": "file:///C:/viewer/"
    }
  },

  "versionControlProvenance": [
    {
      "repositoryUri": "https://github.com/contoso/sdk",
      "revisionId": "b87c4e9"
      "uriBaseId": "SDK_ROOT"
    },
    {
      "repositoryUri": "https://github.com/contoso/viewer",
      "revisionId": "cafdac7"
      "uriBaseId": "VIEWER_ROOT"
    }
  ],
  ...
}

Now suppose the consumer has the following fileLocation object in hand:

"fileLocation": {
  "uri": "io/Streams.cs",
  "uriBaseId": "SDK_SOURCE_ROOT"
}

To find out where this file came from in source control, the consumer proceeds as follows:

  1. Walks run.originalUriBaseIds to find the top-level uriBaseId, which is "SDK_ROOT".
  2. Searches run.versionControlProvenance to find the versionControlDetails object with the matching uriBaseId. It finds the that the file came from the "contoso/sdk" repository.

Update 2018/11/13 @lgolding: The requirement that versionControlProvenance.uriBaseId point to a top-level uriBaseId is too strict. There might be reasons why multiple independent repos need to be synced to a related location, for instance, they might need to be siblings. Then you might have a top-level uriBaseId "HOME", as in @kupsch's example below, that does not correspond to any repo.

@ghost
Copy link

ghost commented Nov 7, 2018

  • This was discussed at TC #25 and the design was approved.
  • At TC #26, we decided not to approve the change draft until we confirmed that it worked with Git submodules.
  • Also, @kupsch has an action item to present an alternative proposal in the issue.

@michaelcfanning FYI.

@kupsch
Copy link

kupsch commented Nov 12, 2018

I would prefer to just base locating provenance on the location, not BaseUriId as this puts less of a burden on the producer.

<comment by="lgolding">:

I think @kupsch means that he would prefer to map a fileLocation to an element of the versionControlProvenance array by using the entire fileLocation object -- that is, its uri and uriBaseId properties -- rather than by using only the fileLocation object's uriBaseId property.

And by "less of a burden on the producer," I think he means that my design requires a separate top-level uriBaseId for every entry in the versionControlProvenance array -- whereas in his design, multiple entries can use the same uriBaseId, but with different uris. In his example, all three versionControlProvenance entries use the same uriBaseId, PACKAGE_ROOT, so he needs only one entry in originalUriBaseIds.

</comment>

{
  "originalUriBaseIds": {
    "HOME": {
      "uri": "/home/user"
    }
    "PACKAGE_ROOT": {
      "uri": "package",
      "uriBaseId": "HOME"
    },
  },

  "versionControlProvenance": [
    {
      "repositoryUri": "https://github.com/example-package/src",
      "revisionId": "b87c4e9"
      "mappedTo": {
        "uriBaseId": "PACKAGE_ROOT"
      },
    },
    {
      "repositoryUri": "https://github.com/example-package/plugin1",
      "revisionId": "cafdac7"
      "mappedTo": {
        "uriBaseId": "PACKAGE_ROOT"
        "uri": "plugin1",
    },
    {
      "repositoryUri": "https://github.com/example-package/plugin1",
      "revisionId": "cafdac7"
      "mappedTo": {
        "uriBaseId": "PACKAGE_ROOT"
        "uri": "plugin2",
    }
  ],
  ...
}

All of these resolve to the same provenance:

"fileLocation":  {
  "uri": "plugin1/x.c",
  useBaseId: "PACKAGE_ROOT",
} 

"fileLocation":  {
  "uri": "package/plugin1/x.c",
  useBaseId: "HOME",
}

"fileLocation":  {
  "uri": "/home/user/package/plugin1/x.c"
} 

To determine if a file is within a version controlled directory, the following algorithm or equivalent will be used:

  1. All file locations are expanded to be an (null, absoluteUri) or (baseUriId, relativeUri) pair if a BaseUriId is not defined denoted as (B, U).

    <comment by="lgolding>

    Let's take the first fileLocation in @kupsch's example. It is: (PACKAGE_ROOT, plugin/x.c). From the definitions in originalUriBaseIds, we can expand this to (HOME, package/plugin/x.c), and finally to (null, /home/user/package/plugin/x.c).

    </comment>

  2. If the fileLocation under consideration maps to (f.B, f.U), and a versionControlProvenance entry's fileLocation mapping is (v.B, v.U), then let S be the set of versionControlProvenance records where f.B == v.B and v.U is a prefix of f.U.

    <comment by="lgolding">

    Continuing the example: After the expansion, f.B is null and f.U is /home/user/package/plugin1/x.c. We can similarly expand the mappedTo properties in the versionControlProvenance array:

    1. (PACKAGE_ROOT, null)(HOME, package)(null, /home/user/package)
    2. (PACKAGE_ROOT, plugin1)(HOME, package/plugin1)(null, /home/package/plugin1)
    3. (PACKAGE_ROOT, plugin2)(HOME, package/plugin2)(null, /home/package/plugin2)

    The versionControlProvenance records at indices 0 and 1 satisfy the condition "f.B == v.B and v.U is a prefix of f.U" (because both /home/user/package and /home/user/package/plugin1 are prefixes of /home/user/package/plugin1/x.c), but the record at index 2 does not satisfy the condition (because /home/user/package/plugin2 is not a prefix of /home/user/package/plugin1/x.c).

    </comment>

  3. If this set if empty then there is no versionControlProvenance for this fileLocation. Otherwise:

    <comment by="lgolding">The set S is not empty. It contains the records at indices 0 and 1.</comment>

  4. versionControlProvenance is the v in S with the longest v.U.

    <comment by "lgolding">

    The record with the longest U is the record at index 1, whose U is /home/user/package/plugin1. Therefore the version control provenance of the file location (PACKAGE_ROOT, plugin/x.c) is the record at index 1.

    </comment>

@ghost
Copy link

ghost commented Nov 13, 2018

I had an action item from TC #26 to confirm that my proposed design works with submodules. It does.

Here's an example. I have two GitHub repos, lgolding/my-app and lgolding/my-plugin. my-app incorporates my-plugin as a submodule, rooted in the default location (the my-plugin subdirectory). At the time of this writing, my-app refers to my-plugin@fd68f11.

The directory tree looks like this:

C:\src\my-app\
    README.md
    my-plugin\
        README.md

The SARIF might look like this:

{
  "$schema": "http://json.schemastore.org/sarif-2.0.0",
  "version": "2.0.0",
  "runs": [
    {
      "originalUriBaseIds": {
        "APP_ROOT": {
          "uri": "/home/lgolding/my-app"
        },
        "PLUGIN_ROOT": {
          "uri": "my-plugin",
          "uriBaseId": "APP_ROOT"
        }
      },

      "versionControlProvenance": [
        {
          "repositoryUri": "https://github.com/lgolding/my-app",
          "revisionId": "728a1d5",
          "uriBaseId": "APP_ROOT"
        }
      ],

      "results": [
        ...
      ]
    }
  ]
}

Note that the SARIF does not include a separate versionControlProvenance entry for the submodule:

              {
                  "repositoryUri": "https://github.com/lgolding/my-plugin",   # NO!!!
                  "revisionId": "fd68f11",
                  "uriBaseId": "PLUGIN_ROOT"
              }

Cloning the root repo with the appropriate command line arguments "just works":

git clone --recurse-submodules https://github.com/lgolding/my-app

You would not want an automated system to do that, and then go ahead a separately clone the submodule's repo.

@ghost ghost removed the change-draft-available label Nov 28, 2018
@ghost
Copy link

ghost commented Dec 9, 2018

@kupsch @michaelcfanning

I've now taken a careful look at Jim's proposed alternative (and interspersed some commentary in it, because I had to carefully work through his example to understand his fairly abstract description :-)), and I agree his design is better. I'll write the change draft from his design.

ghost pushed a commit that referenced this issue Dec 10, 2018
@michaelcfanning
Copy link
Contributor Author

@lgolding this issue does not describe the change that you intend to make. can you please recap the design that you and jim prefer? i am most interested in verifying that we have not overly complicated the simple case (no git sub-modules) in order to accommodate the uncommon case (sub-modules).

ghost pushed a commit that referenced this issue Dec 10, 2018
@ghost
Copy link

ghost commented Dec 12, 2018

@michaelcfanning As we've just discussed, Jim's comment above, in which I interspersed my commentary, is the final design and is what the change draft contains.

Also as we've just discussed: Jim's design works simply even in the single-repo case. It does not introduce complication purely for the sake of the less common multi-repo or submodule cases.

I'm leaving this comment just so that other readers of this issue know that we didn't leave your last comment unaddressed.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. design-approved The TC approved the design and I can write the change draft enhancement impact-non-breaking-change resolved-fixed triage-approved
Projects
None yet
Development

No branches or pull requests

2 participants