Search code examples
gitrevision-history

Check if two git repositories are related


Given two bare non-shallow git repositories, how can I programmatically (via a Python script) check whether they are related? The repositories may have completely different branches, or equally-named branches that point to different histories. If I simply do a push (possibly with --dry-run), git will just create a new branch even if the two repositories have nothing in common. If I do a "pull" in reverse direction, git prints "refusing to merge unrelated histories", but with --dry-run, does not indicate any error.

I had the idea of obtaining a list of all commit hashes in both repositories (including all branches and "lost" commits without a branch head) and checking whether they contain a common subset. However, I could not find a way to truly find all hashes.

I need this as part of a script that automatically collects changes made to many repositories and incorporate them into old versions of those repos, but want to make sure not to accidentally push to the wrong, possibly equally-named but unrelated, repository.


Solution

  • To get a list of all commit hashes in a repo

    git rev-list --all --full-history
    

    This will report the hash of every commit reachable from any ref, with history simplification disabled - which should reliably give you every commit hash.

    (It is possible to "miss" dangling commits, but these generally aren't pushed or fetched anyway and are subject to somewhat arbitrary deletion, so there's no real reason to count them.)

    For the repo you're going to push to, the above should be fine. For the repo you're pushing from, again the above will work, but it may be a waste of time to compare all of hashes. If you now what change(s) you're applying, and the change is meaningfully applicable, then you should be able to find one of the commits reachable from the change.

    So for example, if you keep refs telling you where branches were last time you sync'd, then you can exclude everything reachable from those refs from your list. (Or if you are only trying to keep particular branches in sync, you can omit --all and just rev-list each of those branches.)