Search code examples
gitgit-blamegit-grep

How to search string in all files in codebase, and list occurrences including the commit date


I'd like to

  • (1) search a git repo codebase files (not commit messages!) for a certain string,
  • and then (2) list all occurrences (filename only, but for each result),
  • but (3) also have those occurrences annotated with the last commit date (or potentially other commit info, like the message).

Is that possible?

What I've tried
I can do (1) and (2) easily with for example grep -r "somestring" ./src | cut -d ":" -f1 (the cut pipe shows only filename, but still lists each occurrence, which grep's -l flag wouldn't).
But how to fit in (3) is unclear to me. It's sort of git blame, but applied to all files.

The overall goal is to get an idea of how recently/frequently certain constructs (tags) are still added in the code.

My current manual workaround is to do a global search in VSCode, click each result and wait until VSC's Git Lens shows the last commit date for that line enter image description here

For example, an output like the following would help me to find all occurrences of e.g. "print(" in the codebase, with a blame date annotation behind each filename. It shows that the string occurs 4 times, of which twice in the same file, and when their line was last inserted/changed

$ magiccommand "print("
/src/file1.py   2024-05-06
/src/file5.py   2021-02-13
/src/file5.py   2021-10-01
/src/subdir/core.py   2025-01-03
...

Solution

  • Would the following suit your needs?

    1. Find all files that contain your search pattern
    2. Blame each file from step 1
    3. Filter lines containing your search pattern

    It's slow, but it does what you want. The output is not exactly like specified in your question, but maybe it works for you.

    Example from git.git:

    $ git grep -l 'printf(' | while IFS= read -r file; do git blame -f "$file"; done | grep 'printf('
    d7d850e2b97 Documentation/CodingGuidelines (Ævar Arnfjörð Bjarmason 2022-10-10 13:37:59 -0700 303)    . %z and %zu as a printf() argument for a size_t (the %z being for
    d7d850e2b97 Documentation/CodingGuidelines (Ævar Arnfjörð Bjarmason 2022-10-10 13:37:59 -0700 305)      printf("%"PRIuMAX, (uintmax_t)v).  These days the MSVC version we
    76644e3268b Documentation/MyFirstContribution.txt (Emily Shaffer           2019-05-17 12:07:02 -0700  179)  printf(_("Pony saying hello goes here.\n"));
    2656fb16ddb Documentation/MyFirstContribution.txt (Emily Shaffer           2019-05-29 13:18:09 -0700  291) existing `printf()` calls in place:
    76644e3268b Documentation/MyFirstContribution.txt (Emily Shaffer           2019-05-17 12:07:02 -0700  298)  printf(Q_("Your args (there is %d):\n",
    76644e3268b Documentation/MyFirstContribution.txt (Emily Shaffer           2019-05-17 12:07:02 -0700  303)      printf("%d: %s\n", i, argv[i]);
    76644e3268b Documentation/MyFirstContribution.txt (Emily Shaffer           2019-05-17 12:07:02 -0700  305)  printf(_("Your current working directory:\n<top-level>%s%s\n"),
    

    You might want to have a look at the -p/--porcelain options and friends which output "in a format designed for machine consumption" to post-process the output to your liking.

    You can easily store it as executable git-magic in your PATH with the following content:

    #!/bin/sh
    git grep -l "$1" | while IFS= read -r file; do
      git blame -f "$file"
    done | grep "$1"
    

    and then simply run git magic 'printf(' to execute it.

    NB. File names with line breaks are not supported.