“The implication here is that any code committed to a public repository may be accessible forever as long as there is at least one fork of that repository,” the report’s authors claim.

Am I dumb or is this exactly the purpose of forks? I feel like I’m missing something.

  • Eager Eagle
    link
    English
    141 month ago

    from their actual report

    As long as one fork exists, any commit to that repository network (ie: commits on the “upstream” repo or “downstream” forks) will exist forever.
    
       This further cements our view that the only way to securely remediate a leaked key on a public GitHub repository is through key rotation. We’ve spent a lot of time documenting how to rotate keys for the most popularly leaked secret types - check our work out here: howtorotate.com.
    
    • @[email protected]
      link
      fedilink
      English
      111 month ago

      I’m still not sure that answers it. If I fork a project, and the upstream project commits an API key (after I’ve forked it), then they delete the commit, does this commit stay available to me (unexpected behaviour)? Or is it only if I sync that commit into my repo while it’s in the upstream repo (expected behaviour)?

      Or is it talking about this from a comment here:

      Word of caution 2: The commit can still be accessible directly via SHA1. Force push does not delete the commit, it creates a new one and moves the file pointer to it. To truly delete a commit you must delete the whole repo.

      Someone replies and said by having garbage collection kick in it removes this unconnected commit, but it’s not clear to me whether this works for github or just the local git repo.

      Perhaps the issue is that these commits are synced into upstream/downstream repos when synced when they should not be?

      Like I said, I’m really confused about the specifics of this.

      • Morphit
        link
        fedilink
        English
        71 month ago

        I think Github keeps all the commits of forks in a single pool. So if someone commits a secret to one fork, that commit could be looked up in any of them, even if the one that was committed to was private/is deleted/no references exist to the commit.

        The big issue is discovery. If no-one has pulled the leaky commit onto a fork, then the only way to access it is to guess the commit hash. Github makes this easier for you:

        What’s more, Ayrey explained, you don’t even need the full identifying hash to access the commit. “If you know the first four characters of the identifier, GitHub will almost auto-complete the rest of the identifier for you,” he said, noting that with just sixty-five thousand possible combinations for those characters, that’s a small enough number to test all the possibilities.

        I think all GitHub should do is prune orphaned commits from the auto-suggestion list. If someone grabbed the complete commit ID then they probably grabbed the content already anyway.

        • @[email protected]
          link
          fedilink
          English
          21 month ago

          Thanks, I think that explains it a bit more. It is unexpected to me, as a non-git expert, and I’m sure many others.

          • Morphit
            link
            fedilink
            English
            21 month ago

            I guess the funny thing is that each Git commit is internally just a file. Branches and tags are just links to specific commit files and of course commits link to their parents. If a branch gets deleted or jumped back to a previous commit, the orphaned commits are still left in the filesystem. Various Git actions can trigger a garbage collection, but unless you generate huge diffs, they usually stick around for a really long time. Determining if a commit is orphaned is work that Git usually doesn’t bother doing. There’s also a reflog that can let you recover lost commits if you make a mistake.

      • @[email protected]
        link
        fedilink
        English
        51 month ago

        In my experience with GitHub, dropped commits remain indefinitely accessible. I use this to my advantage on pull requests with lots of good commit context that I don’t want totally lost in a squash: by copying result of git log --oneline main... into the PR body. The SHAs remain accessible even after I force push my branch down to a single commit.

        I think there is a theoretical limit to how long these commits remain accessible, but I haven’t ever hit it in my daily usage.