Search code examples
pythongitgitpython

Get the diff details of first commit in GitPython


In GitPython, I can iterate separately the diff information for every change in the tree by calling the diff() method between different commit objects. If I call diff() with the create_patch=True keyword argument, a patch string is created for every change (additions, deletions, renames) which I can access through the created diff object, and dissect for the changes.

However, I don't have a parent to compare to with the first commit.

import git
from git.compat import defenc
repo = git.Repo("path_to_my_repo")

commits = list(repo.iter_commits('master'))
commits.reverse()

for i in commits:

    if not i.parents:
        # First commit, don't know what to do
        continue
    else:
        # Has a parent
        diff = i.diff(i.parents[0], create_patch=True)

    for k in diff:
        try:
            # Get the patch message
            msg = k.diff.decode(defenc)
            print(msg)
        except UnicodeDecodeError:
            continue

You can use the method

diff = repo.git.diff_tree(i.hexsha, '--', root=True)

But this calls git diff on the whole tree with the given arguments, returns a string and I cannot get the information for every file separately.

Maybe, there is a way to create a root object of some sorts. How can I get the first changes in a repository?

EDIT

A dirty workaround seems to be comparing to the empty tree by directly using its hash:

EMPTY_TREE_SHA = "4b825dc642cb6eb9a060e54bf8d69288fbee4904"

....

    if not i.parents:
        diff = i.diff(EMPTY_TREE_SHA, create_patch=True, **diffArgs)
    else:
        diff = i.diff(i.parents[0], create_patch=True, **diffArgs)

But this hardly seems like a real solution. Other answers are still welcome.


Solution

  • The short answer is you can't. GitPython does not seem to support this method.

    It would work to do a git show on the commit, but GitPython does not support that.

    You can on the other hand use the stats functionality in GitPython to get something that will allow you to get the information you need:

    import git
    
    repo = git.Repo(".")
    
    commits = list(repo.iter_commits('master'))
    commits.reverse()
    print(commits[0])
    print(commits[0].stats.total)
    print(commits[0].stats.files)
    

    This might solve your problem. If this does not solve your problem you would probably be better off trying to use pygit2 which is based on libgit2 - The library that VSTS, Bitbucket and GitHub use to handle Git on their backends. That is probably more feature complete. Good luck.