Search code examples
gitpublishingprivacycomputer-forensicsgit-reflog

Is there a canonical way to retroactively split a git repo into a public and private variant?


I have a git repository containing files which have some sensitive data possibly hardcoded, or formally hardcoded and now residing at some points in the git history.

In the interest of making the project publicly available so programers with similar interests can benefit form it and contribute changes back, I want to fork it an sanitize the offending files.

The procedure I considered was as follows:

  1. Shallow/Shared clone the repo locally to a new local location, this folder will become the public variant. Subsequent steps are in the new repo.
  2. Branch the master into a branch public-master
  3. Remove all other branch refs.
  4. Sanitize public-master
  5. Squash public-master
  6. git reflog expire --expire-unreachable=now --all && git gc --prune=all --agressive remove all unreachable refs, which is now any obj not in the public branch
  7. git push add the public master back upstream into the private repository.
  8. Set origin remote to public repo url, branch onto master. Push to origin.

Is this sufficient to sanitize my repo, or would it be possible to recover sensitive data after this. Is there a more sensible and common way to resolve this problem? Are any of the steps extranious?

For example can I do this all in one repository, or does the nature of git-packs mean I might still push an obj that contains sensitive information?


Solution

  • The only problem is I want to be able to pull from the private repo, and then they would have unshared history.

    That seems unavoidable, since you have change the branch history and squash it.

    Instead of pulling from the new public repo, I would simply consider changes done one the new repo clone and decide which one I want to add to the local clone of the old private repo:

    # update local content of new repo
    cd /path/to/public/repo 
    git pull
    
    # check what needs to be added
    cd /path/to/clone/of/old/repo
    git --work-tree=/path/to/public/repo add -p .
    

    You will see the diffs between old and new, coming from possible new evolution done on the public repo.