Search code examples
pythongittime

What is the most efficient method to get the last modification time of every file in a git revision?


I want to programmatically list the name and last modification time of every file in a certain revision. Running git log for every file, as suggested here is very slow. Is there a faster way to accomplish this?

Running the script below on a non-trivial repo (SDL) takes 59s on my machine.

#!/usr/bin/env python

import datetime
import subprocess
import time

commit = "HEAD"

start = time.time()

file_names = subprocess.check_output(["git", "ls-tree", "--name-only", "-r", commit], text=True).strip().split("\n")

print(f"[{time.time() - start:.4f}] git ls-tree finished")

file_times = list(datetime.datetime.fromisoformat(subprocess.check_output(["git", "log", "-1", "--pretty=format:%cI", commit, "--", name], text=True).strip()) for name in file_names)

print(f"[{time.time() - start:.4f}] git info finished")

Solution

  • The basic idea is to postprocess git log --name-status with whatever per-commit info you want and look for the first occurrence of names you're interested in. The all-of-them version:

     git log --name-status --pretty=%ci | awk -F$'\t' '
             NF==1 { stamp=$0; next }
             !seen[$2]++ { print stamp,$0 }
    ' | sort -t$'\t' -k2,2
    

    and as always season to taste. Are you running on spinning rust? I do that on the SDL default checkout with a cheap ssd it takes 0.548s, so more than a hundred times faster. But then, it's doing 1500+ times fewer walks through history so there's that.