Search code examples
gitgithubgit-branchremote-branch

Push a branch of a git repo to a new remote (github), hiding its history


My organisation is preparing to release an open-source version of our software using github, however I'm not sure the best way to approach this:

We have two branches master and release, master contains some proprietary components that we have decided not to release, and release contains the cleaned-up version that we want to distribute. The problem is, if we just push the release branch to github, the proprietary components can be retrieved by looking through the revision history.

I was considering creating a separate repository, copying the HEAD of relase into it, doing a git init, and pushing that repository to github. However, we want to retain the ability to cherry-pick certain patches from master into release in the future, and push those changes up to github.

Is there a way to do this without maintaining two separte repositories?

Thanks!

Update:

To be a little more specific, this is sort-of what our commit history looks like at the moment:

--- o - o - o - o - f - o - o - f - master
             \
              c - c - c - c - c - c - c - REL - f - f

Where 'o' are commits in the master, proprietary branch, 'c' are commits that remove things that should not be published (often not removing entire files, but reworking existing ones not to rely on proprietary components), and 'f' are fixes in master that apply to release as well, and so have been cherry-picked. REL is a tagged version of the code we deem safe to publish, with no history whatsoever (even previous versions of the release branch, since not all the proprietary material had been removed before the REL tag).


Solution

  • Ben Jackson's answer already covers the general idea, but I'd like to add a few notes (more than a comment's worth) about the ultimate goal here.

    You can quite easily have two branches, one with an entirely clean (no private files) history, and one complete (with the private files), and share content appropriately. The key is to be careful about how you merge. An oversimplified history might look something like this:

    o - o - o - o - o - o - o (public)
     \       \           \   \
      x ----- x ----x---- x - x (private)
    

    The o commits are the "clean" ones, and the x are the ones containing some private information. As long as you merge from public to private, they can both have all the desired shared content, without ever leaking anything. As Ben said, you do need to be careful about this - you can't ever merge the other way. Still, it's quite possible to avoid - and you don't have to limit yourself to cherry-picking. You can use your normal desired merge workflow.

    In reality, your workflow could end up a little more complex, of course. You could develop a topic (feature/bugfix) on its own branch, then merge it into both the public and the private versions. You could even cherry-pick now and then. Really, anything goes, with the key exception of merging private into public.

    filter-branch

    So, your problem right now is simply getting your repository into this state. Unfortunately, this can be pretty tricky. Assuming that some commits exist which touch both private and public files, I believe that the simplest method is to use filter-branch to create the public (clean) version:

    git branch public master   # create the public branch from current master
    git filter-branch --tree-filter ... -- public    # filter it (remove private files with a tree filter)
    

    then create a temporary private-only branch, containing only the private content:

    git branch private-temp master
    git filter-branch --tree-filter ... -- private-temp    # remove public files
    

    And finally, create the private branch. If you're okay with only having one complete version, you can simply merge once:

    git branch private private-temp
    git merge public
    

    That'll get you a history with only one merge:

    o - o - o - o - o - o - o - o - o - o (public)
                                         \
      x -- x -- x -- x -- x -- x -- x --- x (private)
    

    Note: there are two separate root commits here. That's a little weird; if you want to avoid it, you can use git rebase --root --onto <SHA1> to transplant the entire private-temp branch onto some ancestor of the public branch.

    If you'd like to have some intermediate complete versions, you can do the exact same thing, just stopping here and there to merge and rebase:

    git checkout -b private <private-SHA1>  # use the SHA1 of the first ancestor of private-temp
                                            # you want to merge something from public into
    git merge <public-SHA1>           # merge a corresponding commit of the public branch
    git rebase private private-temp   # rebase private-temp to include the merge
    git checkout private
    git merge <private-SHA1>          # use the next SHA1 on private-temp you want to merge into
                                      # this is a fast-forward merge
    git merge <public-SHA1>           # merge something from public
    git rebase private private-temp   # and so on and so on...
    

    This will get you a history something like this:

    o - o - o - o - o - o - o - o - o - o (public)
          \              \               \
      x -- x -- x -- x -- x -- x -- x --- x (private)
    

    Again, if you want them to have a common ancestor, you can do an initial git rebase --root --onto ... to get started.

    Note: if you have merges in your history already, you'll want to use the -p option on any rebases to preserve the merges.

    fake it

    Edit: If reworking the history really turns out to be intractable, you can always totally fudge it: squash the entire history down to one commit, on top of the same root commit you already have. Something like this:

    git checkout public
    git reset --soft <root SHA1>
    git commit
    

    So you'll end up with this:

    o - A' (public)
     \
      o - x - o - x - X - A (public@{1}, the previous position of public)
                   \
                    x - x (private)
    

    where A and A' contain exactly the same content, and X is the commit in which you removed all private content from the public branch.

    At this point, you can do a single merge of public into private, and from then on, follow the workflow that I described at the top of the answer:

    git checkout private
    git merge -s ours public
    

    The -s ours tells git to use the "ours" merge strategy. This means it keeps all content exactly as it is in the private branch, and simply records a merge commit showing that you merged the public branch into it. This prevents git from ever applying those "remove private" changes from commit X to the private branch.

    If the root commit has private information in it, then you'll probably want to create a new root commit, instead of committing once on top of the current one.