Search code examples
gitgithubgit-submodules

git version branch of thirdparty repository within my own repository


I would like to use thirdparty code hosted in a git repo within my own git repo. I will need to modify and extend some parts of the thirdparty code. But I would still like to be able to merge certain new commits to the thirdparty repo into my project.

I tried to use git submodule for this. I leave the remote on the thirdparty repo so I can pull in changes. I create a new branch on the submodule that I don't push to the thirdparty repo ( I don't have write access anyway).

But when somebody else clones my repo, they will get Fetched in submodule path 'thirdparty-repo', but it did not contain abcd on git submodule update, because the remote repository that is still configured in .gitmodules does not contain the commit abcd.

How can I fix this? Or is there a better approach to achieve my requirements?


Solution

  • phd already supplied the answer as a comment–fork the submodule—to which you said:

    I should have mentioned that the 3rd party repo is hosted on github, whereas my project is on a private gitlab. I suppose I could perform the fork in a couple of manual steps. But it also seems like kind of an overkill to me. The 3rd party repo is quite large and I'm only modifying a small part of it. Also I would need to publish two repositories later on. Is there no direct way?

    There is indeed no "direct way", unless of course you can convince the third-party repository owner to take your changes into their repository. You have already seen the reason why:

    Fetched in submodule path 'thirdparty-repo', but it did not contain hash-id

    A Git submodule, as seen in any Git repository, consists of just two parts:

    • Instructions of the form: If you need the repository itself, here is the URL for cloning.
    • Instructions of the form: To make this superproject commit have the correct submodule commit, enter the submodule repository('s working tree) and do a detached-HEAD checkout of the commit whose hash ID is _______ (with the blank filled in).

    The former is an entry in a .gitmodules file; the latter is a gitlink, which is a "file" (in an index entry in the superproject, copied from a commit in the superproject) with mode 160000 instead of mode 100644 (regular read/write file) or 100755 (executable read/write file). All index entries come with a hash ID: when the file is a file, that's the hash ID of a blob. When the file is a symbolic link (mode 120000), that's the hash ID of a blob containing the target path for the symbolic link. And, when the "file" is a gitlink, that's the hash ID of a commit.

    (Some people sometimes want submodules to work based on branch names, rather than on gitlinks. There is paltry support for this in Git, but there is a project to make it better, for which we might have some future hope. For now we're largely stuck with the gitlinks. That's not really relevant here, but worth remembering: submodules don't work based on branch names, for good reasons: branch names are, in general, meaningless and uncontrolled and not unique the way hash IDs are.)

    There's very little extra resources that will be devoted to your GitHub fork. A fork on GitHub is a clone, but it is a clone that shares storage with the original. So you're not really adding much load to GitHub here (although the parable of camels, backs, and straws comes to mind)—and anyone who clones your repo will clone some repository when they get to the point of cloning the submodule. That will take as much space as it takes: however many megabytes, plus your commits if they're merged, if they clone the third party's repository, and however many megabytes, plus your commits, if they clone your fork. Since you are only modifying a few files, your extra commits added to your fork hardly add any weight to your fork: they literally share all the unmodified files with the earlier commits.