Search code examples
pythongitgitpython

List the content of a directory for a specific git commit using GitPython


Using GitPython, I'm trying to list the content of a directory at a given commit (i.e. a "snapshot" of the directory at the time).

In the terminal, what I'd do is:

git ls-tree --name-only 4b645551aa82ec55d1794d0bae039dd28e6c5704

How can I do the same in GitPyhon?

Based on the answers I've found to a similar question (GitPython get tree and blob object by sha) I've tried recursively traversing base_commit.tree and its .trees, but I don't seem to get anywhere.

Any ideas?


Solution

  • Indeed, traversing the trees/subtrees is the right approach. However, the built in traverse method can have issues with Submodules. Instead, we can do the traversal ourselves iteratively and find all the blob objects (which contain the files in our repo at a given commit). There's no need to use execute.

    def list_files_in_commit(commit):
        """
        Lists all the files in a repo at a given commit
    
        :param commit: A gitpython Commit object
        """
        file_list = []
        dir_list = []
        stack = [commit.tree]
        while len(stack) > 0:
            tree = stack.pop()
            # enumerate blobs (files) at this level
            for b in tree.blobs:
                file_list.append(b.path)
            for subtree in tree.trees:
                stack.append(subtree)
        # you can return dir_list if you want directories too
        return file_list
    

    If you want the files affected by a given commit, this is available via commit.stats.files.