Search code examples
pythongitgitpython

GitPython: get list of remote commits not yet applied


I am writing a Python script to get a list of commits that are about to be applied by a git pull operation. The excellent GitPython library is a great base to start, but the subtle inner workings of git are killing me. Now, here is what I have at the moment (simplified and annotated version):

repo = git.Repo(path)                           # get the local repo
local_commit = repo.commit()                    # latest local commit 
remote = git.remote.Remote(repo, 'origin')      # remote repo
info = remote.fetch()[0]                        # fetch changes
remote_commit = info.commit                     # latest remote commit
if local_commit.hexsha == remote_commit.hexsha: # local is updated; end
  return
                                                # for every remote commit
while remote_commit.hexsha != local_commit.hexsha:
  authors.append(remote_commit.author.email)    # note the author
  remote_commit = remote_commit.parents[0]      # navigate up to the parent

Essentially it gets the authors for all commits that will be applied in the next git pull. This is working well, but it has the following problems:

  • When the local commit is ahead of the remote, my code just prints all commits to the first.
  • A remote commit can have more than one parent, and the local commit can be the second parent. This means that my code will never find the local commit in the remote repository.

I can deal with remote repositories being behind the local one: just look in the other direction (local to remote) at the same time, the code gets messy but it works. But this last problem is killing me: now I need to navegate a (potentially unlimited) tree to find a match for the local commit. This is not just theoretical: my latest change was a repo merge which presents this very problem, so my script is not working.

Getting an ordered list of commits in the remote repository, such as repo.iter_commits() does for a local Repo, would be a great help. But I haven't found in the documentation how to do that. Can I just get a Repo object for the Remote repository?

Is there another approach which might get me there, and I am using a hammer to nail screws?


Solution

  • I realized that the tree of commits was always like this: one commit has two parents, and both parents have the same parent. This means that the first commit has two parents but only one grandparent.

    So it was not too hard to write a custom iterator to go over commits, including diverging trees. It looks like this:

    def repo_changes(commit):
      "Iterator over repository changes starting with the given commit."
      number = 0
      next_parent = None
      yield commit                           # return the first commit itself
      while len(commit.parents) > 0:         # iterate
        same_parent(commit.parents)          # check only one grandparent
        for parent in commit.parents:        # go over all parents
          yield parent                       # return each parent
          next_parent = parent               # for the next iteration
        commit = next_parent                 # start again
    

    The function same_parent() alerts when there are two parents and more than one grandparent. Now it is a simple matter to iterate over the unmerged commits:

    for commit in repo_changes(remote_commit):
      if commit.hexsha == local_commit.hexsha:
        return
      authors.append(remote_commit.author.email)
    

    I have left a few details out for clarity. I never return more than a preestablished number of commits (20 in my case), to avoid going to the end of the repo. I also check beforehand that the local repo is not ahead of the remote repo. Other than that, it is working great! Now I can alert all commit authors that their changes are being merged.