How do I edit an external submodule, and push it to my own repo?

I am working on my thesis, where I extend another paper. This paper has made all their code public in repo X, also using a submodule themselves: let's call that one Y.

For my own project, I have made an own repo Z in which I have included X as a submodule, and thus Y as a nested submodule. However, I would like to make changes to the code in X and Y, and also be able to push this to my own repo so that I can use it at multiple locations. I do not have push permission to X and Y since it is someone elses repo. What is the best way to go about this? Thanks in advance, I am lost :D

I tried pushing my own repo in its main directory, but then it does not include the code I changed in the submodules. When I go into the submodules and commit the changed code there, I first gave a warning of a detached head. When I fixed this by checking out to a new branch, it now gives me:

git push --set-upstream origin new_branch
ERROR: Permission to \[repo X\] denied to \[myusername\].
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

I now forked repository X and changed the url in my .gitmodules. What happens to submodule Y now?

Solution

Let's start with background on submodules

Submodules are simple, but with complicated results. A "submodule" consists of two parts: there's a .gitmodules file, and there are two Git repositories. One repository is called the superproject and one is called the submodule. The superproject contains, in one or more commits, the raw hash ID of a commit to check out in the submodule. That's the entire picture—up until we start actually doing work, anyway.

Suppose you start by cloning the superproject

Let's say you run:

git clone --no-recurse-submodules <url> super
cd super

You now have a commit checked out in the superproject repository, complete with an instruction that Git, running in this superproject, should check out a commit in the submodule. But there isn't any submodule. You've only cloned one repository, not two.

The --no-recurse-submodules option makes sure that Git doesn't go on to clone the submodule yet. That's the default anyway, but we made it explicit here on purpose, to make it obvious, and in case you have changed your personal settings to turn on recursion.

You now have to instruct Git to complete the checkout, if you'd like the submodule cloned and a commit checked out. (If you prefer not to bother with the submodule at all, you can leave things in this state: you just won't have a submodule, even though the superproject calls for one. The actual clone-and-checkout step is optional, provided whatever you're doing doesn't need the submodule. The Git repository for Git, for instance, includes a submodule reference to a program that checks for SHA-1 collisions, but it's specifically optional and you don't need to bother with it.)

To complete the checkout, you now run:

git submodule update --init

(if you use --recurse-submodules in your clone, the git clone command runs this command for you, after the git checkout step, step 6¹).

There's a problem here: this git submodule update --init must run git clone. The git clone command needs a URL. Where does the URL come from? The solution to this problem is the .gitmodules file, which must exist in the commit checked out in step 6; that file must list the URL(s) for any submodule(s) that must be cloned at this point.

Once those submodules are cloned, the git submodule update command—which you may now use without the --init flag—picks a commit, by hash ID, that it should git checkout in each submodule listed by the superproject. The hash ID that the superproject Git git submodule update command checks out in the submodule repository is the one stored in the commit checked out (in step 6 again)—or more precisely, the hash ID that is now in Git's index (but it got into Git's index via the checkout in step 6).

Any submodules now cloned, via git submodule update --init, can themselves be superprojects by listing a commit hash ID and having a .gitmodules file in them. If you use the --recursive option to git submodule update, it will enter each submodule and make it take its own turn as its superproject in command of its submodule(s). So that's the process by which git clone --recurse-submodules gets all the submodules of submodules of submodules of submodules (however deep the nesting goes).

¹The git clone command is basically shorthand for running six or seven commands, all but one of them being Git commands:

mkdir, or whatever command your system has for making a new folder/directory. The remaining commands are run in the new folder.
git init: this creates a new, totally empty repository.
git remote add: this adds the remote whose name you may select, but everyone uses the default origin, with the URL you gave to git clone.
Any git config commands you specified (default none).
git fetch origin: this copies all the commits from the Git software that responds at the URL you gave. It copies no branches: their branches become your remote-tracking names instead.
git checkout: this creates one branch in your new repository, and checks out the commit. This step is optional as it is suppressed by --no-checkout.
git submodule update --init, if you called for it and if there are submodules listed in the commit checked out in step 6. This step is optional and is not the default.

Suppose you don't start this way

Imagine that instead of just cloning the superproject:

git clone --no-recurse-submodules <url> super
cd super

you personally first clone the superproject itself, but then you go on to personally clone the submodule too:

git clone --no-recurse-submodules <url1> super
cd super
mkdir -p super/sub
git clone --no-recurse-submodules <url2> super/sub

Now you can run git checkout or git switch in super (where you are now), and then git submodule update in super, and Git does not have to clone a new repository. Git just uses the existing submodule clone you've made within the superproject.²

When you use this method, the contents of a .gitmodules file are ignored. So by using this method you don't have to fix any .gitmodules files in any superproject, to use a different submodule URL.

²You might wish to also run git submodule absorbgitdirs after the two git clone commands, so as to switch from the old Git 1.x model of submodule storage to the new Git 2.x model. It's not required, it's just a good idea for somewhat messy "submodules might come and go and then come back again" reasons. When submodules do go away—e.g., by extracting a historical commit—the result is currently quite ugly: submodules have a lot of user-experience warts, leading many to still call them sob-modules, despite the major improvements in them since the early days of Git.

The consequences

What does this all mean? Well, let's say there's some existing superproject that you don't need to touch, but you have a submodule, or a submodule of a submodule, that you do need to touch, and as a result, you want people to clone your submodule, or your sub-sub-module, when they clone the superproject.

All you really have to do is make your own Git repository accessible. Those "in the know" can carefully clone your submodule and put it in place in the superstructure(s) provided by the existing superproject(s). But if you want it to be convenient for others to clone a superproject (only), and have that superproject's git submodule update --init clone your submodule, you must update the .gitmodules file in any Git repository that acts as a superproject for your submodule.

Let's say that you have this structure initially:

super                <-- their superproject
super/sub1           <-- their submodule
super/sub1/sub2      <-- their sub-submodule

There are two .gitmodules files, one in super/sub1/.gitmodules that lists the URL from which one can clone sub2, and one in super/.gitmodules that lists the URL from which one can clone sub1.

Because you've made a new repository that replaces sub2, you must now make a new repository that replaces sub1, in which the .gitmodules file in sub1 lists the new URL for your replacement sub2. But to get the superproject super to list the new URL for your replacement sub1, you must now make a new repostiory that replaces super, in which the .gitmodules file in super lists the new URL for your sub1 replacement.

In other words, the need to touch .gitmodules files "bubbles up" from the lowest submodule, through every intermediate submodule, and eventually reaches the topmost superproject. This is inevitable if you want to make it convenient for others to clone your bottom-level submodule.

But that's an if. Submodules are already horrible; what's another level of horrible-ness here? You can, instead of duplicating other repositories just to update one .gitmodules file in each one, provide an instruction that, after doing a recursive clone or update --init, those who want to use your sub-sub-module should remove the one clone and replace it with yours.

It's your choice: make it convenient for others by making it inconvenient for you, or make it more convenient for you by making it inconvenient for others.