Search code examples
gitgit-log

How to improve git log performance?


I am trying to extract git logs from a few repositories like this:

git log --pretty=format:%H\t%ae\t%an\t%at\t%s --numstat

For larger repositories (like rails/rails) it takes a solid 35+ seconds to generate the log.

Is there a way to improve this performance?


Solution

  • You are correct, it does take somewhere between 20 and 35 seconds to generate the report on 56'000 commits generating 224'000 lines (15MiB) of output. I actually think that's pretty decent performance but you don't; okay.

    Because you are generating a report using a constant format from an unchanging database, you only have to do it once. Afterwards, you can use the cached result of git log and skip the time-consuming generation. For example:

    git log --pretty=format:%H\t%ae\t%an\t%at\t%s --numstat > log-pretty.txt
    

    You might wonder how long it takes to search that entire report for data of interest. That's a worthy question:

    $ tail -1 log-pretty.txt
    30  0   railties/test/webrick_dispatcher_test.rb
    $ time grep railties/test/webrick_dispatcher_test.rb log-pretty.txt 
    …
    30  0   railties/test/webrick_dispatcher_test.rb
    
    real    0m0.012s
    …
    

    Not bad, the introduction of a "cache" has reduced the time needed from 35+ seconds to a dozen milliseconds. That's almost 3000 times as fast.