Search code examples
gitpowershellgit-diffgit-log

Git list all files modified (not added) since a specific commit INCLUDING ones that were added and later modified


I recently started working on a project with a huge code base. I decided to create a local git repo to keep track of all my changes. Rather than downloading all the project's existing files and adding them to git. I only downloaded the ones I needed. As I needed more files, I downloaded them and added them to git.

Now the client wants me to provide a list of all files that I've changed since a particular commit.

git diff --diff-filter=M --name-only $last_deploy_commit_id

gives only the modified files that existed at that commit.

git diff --diff-filter=A --name-only $last_deploy_commit_id

lists all files added since that commit but not (necessarily) modified later on.

git diff --diff-filter=AM --name-only $last_deploy_commit_id

lists all files added OR modified since that commit.

What I want is to have a list of all files that

  • Either, already existed and were modified since that commit
  • Or, didn't exist at that commit, were created AND were later modified, both since that commit.

Is there a way to do this? I'm on Windows, if that helps. I'm open to using some PowerShell based script if need be.


Solution

  • You can pass the --name-status flag to git log to do this, along with a commit range <commit>^..HEAD:

    $ git log --oneline 70f5c30^..HEAD --name-status
    7f6aafa Add poopoo
    A       poopoo.txt
    1d961ae Add hello and goodbye
    M       blar.txt
    M       rawr.txt
    0a1acf9 Add rawr
    A       rawr.txt
    70f5c30 Add blar moo and I'LL BE BACK!
    M       README.md
    A       blar.txt
    

    The commit range <commit>^..HEAD uses an exclusive starting point, meaning that it's not included, so you have to use the parent of <commit>, which is <commit>^. See Pro Git: Commit Ranges.

    NOTE: git log is a porcelain command, meaning that it's not guaranteed to be backwards compatible in future versions of Git. Normally, if you want to use the output of Git commands in a script, you'd use one of the plumbing commands instead. But since this seems to be a one-time use thing, using git log just this once for this purpose seems like a reasonable solution.

    Filtering Out Added but Un-modified Files

    After getting the output above, you could then possibly grep (or whatever the Windows PowerShell equivalent of grep is) lines that contain M or A and sort them, then filter out filenames where there is a line for A, but no line for M.

    I don't want to spend the time to learn enough PowerShell in order to do this, but here's how you could filter the results if you were using a Unix environment with Ruby:

    $ git log --oneline <commit>^..HEAD --name-status | \
    $ grep --extended-regexp "^(A|M)" | \
    $ ruby ~/Desktop/stackoverflow-answer.rb
    

    where stackoverflow-answer.rb contains the following:

    x = ARGF.map { |line| line.split("\t").map(&:chomp) }
            .each_with_object({}) do |parts, hash|
              if hash[parts.last]
                hash[parts.last] << parts.first
              else
                hash[parts.last] = [parts.first]
              end
            end
            .reject { |k,v| v.size == 1 && v.first == 'A' }
            .keys
    puts x