Search code examples
gitpython

Checking if an object is in a repo in gitpython


I'm working on a program that will be adding and updating files in a git repo. Since I can't be sure if a file that I am working with is currently in the repo, I need to check its existence - an action that seems to be harder than I thought it would be.

The 'in' comparison doesn't seem to work on non-root levels on trees in gitpython. Ex.

>>> repo = Repo(path)
>>> hct = repo.head.commit.tree
>>>> 'A' in hct['documents']
False
>>> hct['documents']['A']
<git.Tree "8c74cba527a814a3700a96d8b168715684013857">

So I'm left to wonder, how do people check that a given file is in a git tree before trying to work on it? Trying to access an object for a file that is not in the tree will throw a KeyError, so I can do try-catches. But that feels like a poor use of exception handling for a routine existence check.

Have I missed something really obvious? How does once check for the existence of a file in a commit tree using gitpython (or really any library/method in Python)?

Self Answer

OK, I dug around in the Tree class to see what __contains__ does. Turns out, when searching in sub folders, one has to check for existence of a file using the full relative path from the repo's root. So a working version of the check I did above is:

>>> 'documents/A' in hct['documents']
True

Solution

  • EricP's answer has a bug. Here's a fixed version:

    def fileInRepo(repo, filePath):
        '''
        repo is a gitPython Repo object
        filePath is the full path to the file from the repository root
        returns true if file is found in the repo at the specified path, false otherwise
        '''
        pathdir = os.path.dirname(filePath)
    
        # Build up reference to desired repo path
        rsub = repo.head.commit.tree
    
        for path_element in pathdir.split(os.path.sep):
    
            # If dir on file path is not in repo, neither is file. 
            try : 
                rsub = rsub[path_element]
    
            except KeyError : 
    
                return False
    
        return(filePath in rsub)
    

    Usage:

    file_found = fileInRepo(repo, 'documents/A')
    

    This is very similar to EricP's code, but handles the case where the folder containing the file is not in the repo. EricP's function raises a KeyError in that case. This function returns False.

    (I offered to edit EricP's code but was rejected.)