Search code examples
pythongitgitpython

Number of lines added and deleted in files using gitpython


How to get/extract number of lines added and deleted? (Just like we do using git diff --numstat).

repo_ = Repo('git-repo-path')
git_ = repo_.git
log_ = g.diff('--numstat','HEAD~1') 
print(log_)

prints the entire output (lines added/deleted and file-names) as a single string. Can this output format be modified or changed so as to extract useful information?

Output format: num(added) num(deleted) file-name 

For all files modified.


Solution

  • If I understand you correctly, you want to extract data from your log_ variable and then re-format it and print it? If that's the case, then I think the simplest way to fix it, is with a regular expression:

    import re
    
    for line in log_.split('\n'):
        m = re.match(r"(\d+)\s+(\d+)\s+(.+)", line)
        if m:
            print("{}: rows added {}, rows deleted {}".format(m[3], m[1], m[2]))
    

    The exact output, you can of course modify any way you want, once you have the data in a match m. Getting the hang of regular expressions may take a while but it can be very helpful for small scripts.

    However, be adviced, reg exps tend to be write-only code and can be very hard to debug. However, for extracting small parts like this, it is very helpful.