Search code examples
gitgithubpushgit-commitgit-fetch

pushing limited commit history to remote git


I have a git repository on a lab machine at my school and have been running into an issue I've been trying to solve.

Due to arrangement of the CUDA SDK we're using, I have two remotes in the same directory, but I do not want all the commits from one remote, origin, being pushed to the other remote, "proj1". I'll be more clear below:

Originally, this directory had a git repository with a single remote, and for example, the following commit history:

A-B-C-D-E <-(origin/master)

I then added a second remote and created a local branch from which I would push to and fetch from:

A-B-C-D-E-G <-(origin/master) (master)
        '        
        '-F-H-I <-(proj1/newbranch) (newbranch)

Now when I go to push my changes from "newbranch" to remote "proj1/newbranch", I do NOT want to push commits A-E with it, I want only to push from F and forward.

I know that an orphaned branch is exactly what I'm looking for here, but our lab is running git 1.7.x, which does not have that feature yet, and getting the admins to update it simply takes too long (we don't have permissions to do it ourselves of course).

I also read I could reorder my commits with rebase so that F is the oldest commit, then I could push a single commit to "proj1". But wouldn't doing this alter/mess up my history on the master branch as well? (A-E are already on origin/master)

So I am wondering if I am missing some feature of git to accomplish what I'd like? Is there some other way to delete the commit history of "newbranch" or at least break it off? Maybe what I am doing is bad practice, but like I said, I need to have all files in this directory for the CUDA SDK, and I don't want to mess with that.


Solution

  • [[tl;dr... If you can make your "origin" project a subdirectory of the "proj1" project and use Git submodules, you will live a life of happiness and peace. If you can't or won't, you are doomed to spend 10% of your time on "proj1" development and 90% of your time fighting Git to a bloody death.]]

    Okay, I'm almost positive that the way you think you need to approach this isn't just bad practice, it's unworkable, and so I have a moral responsibility as a fellow Git user to tell you what you should do instead of help you do what you think you should do. Maybe someone else will come along with a magical solution that I haven't thought of, but I wouldn't hold my breath.

    I think you need to come to terms with the fact that these are two separate projects, and they need to have two separate working directories (with separate ".git" subdirectories). This poses two immediate problems, of course. First, if you need these files heavily intermingled in the same directory, this may not seem workable; I try to address this below. Second, if the directories are completely separate, then their histories are tracked completely separately, so when you commit a particular version of "proj1", you won't have a record of which version of "origin" was used to run it.

    If you do want to track the version of "origin" used for each commit of "proj1", then Git submodules (see git help submodule) are the way to go. For this to work, "origin" must be kept in its own subdirectory of the "proj1" tree. You can organize the rest of "proj1" however you like. Again, if you need the files intermingled, see below. Git submodules work well when the development is taking place on one project ("proj1") and the second project ("origin") is just using Git for the convenience of keeping up-to-date and remembering which version of "origin" was used to run which version of "proj1". (Submodules allow changes to be made to the second project, if needed, but it's kind of cumbersome to get everything working right, so it's much better if the submodule is just "read only".)

    By the way, Git subtrees (not orphan branches) are the closest thing to doing what you think you want to do, but they do require the "sub"-project to be housed in its own dedicated subdirectory, and they do require that the entire history of the subprojects be included in the main project; they just allow the subproject to be split off and pushed or pulled independently. You could, in theory, use them for your setup, treating "origin" as the main project and keeping "proj1" in a separate subdirectory as a "subtree". You would work on the repository as normal, and "git subtree" would provide a mechanism to split off the work being done in the "proj1" subtree and commit it separately. Great, right? Sadly, subtrees are only available as a contributed module (not installed by default, I don't think) with Git 1.7.10 and later. But, you could try "git help subtree" to check.

    Any of these solutions, though, will require one project to be isolated in its own subdirectory.

    If the files absolutely must be intermingled in a single directory, you are in for a world of pain: The most straightforward way is still to maintain two separate Git working directories (or a Git subtree) using the mechanisms above (i.e., either totally independent or one a subdirectory of the other using submodules or subtrees) and then build a symlink tree. The symlink tree can either:

    1. Be a separate "build" directory where you actually run the project, with symlinks of all files to each of the real "proj1" and "origin" working directories.

    2. Be a set of symlinks actually added to the "proj1" working directory and checked in to the repository. They can all point to the copy of "origin" in a subdirectory managed as a submodule.

    The only alternative to a symlink tree that I can think of is much more painful. Technically, you could set up a "proj1" working directory that ".gitignore"s all "origin" files. Then, you can happily run "git" to manage the "proj1" files only, ignoring any "origin" stuff. When you want to work with "origin" (e.g., to update it with a "git pull"), you can run Git with a "--git-dir" and/or "--work-tree" argument to match your work tree with a different ".git" directory (configured so that alternate ".gitignore-origin" files are used or something). I've never tried this, and it sounds horrible, but you might get it to work.

    Now, as for the current state of your Git repository, you have a problem. Your "newbranch" is now deeply intertwined with the "origin" project's history, and there's no simple way to break it apart. If you want to rebuild the history, you either need to use some filter-branch black magic, or you need to do it manually (e.g., for each commit from beginning to end, check it out in your current tree, copy the "proj1" files to a fresh Git working directory leaving out any "origin" files, and recommit).

    With respect to orphan branches, they won't help you here. Orphan branches are just plain old branches that happen to share no history with other branches in the same repository. It may seem like what you're after, but once you go through the pain of setting them all up, you'll discover something distressing. When you "git checkout newproj" to work on "proj1", Git will check out all your "newproj" files and delete all the CUDA API files! And when you "git checkout master" to get access to the CUDA API files, Git will check them all out and delete all your "newproj" files! How do you get all the files at once? Obviously, you set up two separate working directories and check "newproj" out in one and "master" out in the other, then combine them using one of the methods above. It provides no advantage over treating these as completely separate projects. You can't have an orphan branch that somehow "keeps" the CUDA API files around without having them checked in to the branch.