Search code examples
pythongitgitpython

GitPython `repo.index.commit()` spawns persistent git.exe instance, holds handles to repo


I am trying to use GitPython for some repo manipulation, but ran into issues with my app, with handles open where i wouldn't expect.

Bug-jarring the issue, it seems that calling repo.index.commit() results in a handle to the directory (presumably something in .git\). Later this causes other failures in what my app is trying to do.

Here is a working unittest:

import unittest
import git
import tempfile
import os.path

class Test(unittest.TestCase):

    def testCreateRepo(self):
        with tempfile.TemporaryDirectory(prefix=(__loader__.name) + "_") as mydir:

            # MAKE NEW REPO 
            repo = git.Repo.init(path=os.path.join(mydir, "newRepo"), mkdir=True)
            self.assertTrue(os.path.isdir(os.path.join(repo.working_dir, ".git")), "Failed to make new repo?")

            # MAKE FILE, COMMIT REPO
            testFileName = "testFile.txt"
            open(os.path.join(repo.working_dir, testFileName) , "w").close()
            repo.index.add([testFileName])
            self.assertTrue(repo.is_dirty())

            #### 
            # COMMENTING THIS OUT --> TEST PASSES
            repo.index.commit("added initial test file") 
            self.assertFalse(repo.is_dirty())
            #### 

            # adding this does not affect the handle
            git.cmd.Git.clear_cache()


            print("done") # exception thrown right after this, on __exit__

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\%USER%\AppData\Local\Temp\EXAMPLE_gitpython_v3kbrly_\newRepo'

digging a little deeper, it seems that gitPython spawns multiple instances of git.exe processes, and each of them holds a handle to the root folder of the repo newRepo.

  • set a breakpoint immediately before the error, use sysinternals/handle to see open handles to newRepo ... git.exe (4 separate PID's of git.exe to be precise)
  • using sysinternals/procexp i can see that that they are all spawned from the eclipse-->python

stepping through, it's the call to repo.index.commit() that actually leads to the the git.exe(s) being spawned.


Solution

  • Working with the gitpython devs, I found the answer:

    Due to internal caching behavior of gitpython, you must force a garbage collection, and tell the repo to clear it's cache. I was doing the latter, but on the wrong object.

    the following must be added prior to cleaning up your dir (__exit__()ing my with:/context-manager clause, in the code above)

    import gc
    gc.collect()
    repo.git.clear_cache()
    

    Those do not seem to obey least-surprise :) hopefully the api can be improved in the future.