Search code examples
gitgithooksgitattributes

Git's `.gitattributes` filter not updating content in the worktree


I wanted to use .gitattributes filter to update files on git add (actually, anything before commit would be fine).

However, despite the filter being ran, the content stays the same in the worktree.

I made a minimal repo as a showcase:

foo.txt

Foo
<start>
<stop>
Bar

update-toc

#!/usr/bin/env python3

import re
import sys

def parse(lines):
    start = -1
    stop = -1
    for i, line in enumerate(lines):
        if re.match('<start>', line):
            start = i
            continue
        if re.match('<stop>', line):
            stop = i
            continue
    return start, stop

print('STARTING FILTER', file=sys.stderr)
lines = sys.stdin.readlines()
start, stop = parse(lines)
if start != -1 and stop != -1:
    new_lines = lines[:start + 1] + ['baz\n'] + lines[stop:]
else:
    new_lines = lines
sys.stdout.write(''.join(new_lines))
print('ENDING FILTER', file=sys.stderr)
sys.stderr.write(''.join(new_lines))

.gitattributes

*.txt   filter=update-toc

.git/config

[filter "update-toc"]
        smudge = ./update-toc
        clean = ./update-toc
        required

On this repo, if I run the following commands I can witness the filter being ran successfully but the content itself not being updated:

$ git add --renormalize foo.txt
STARTING FILTER
ENDING FILTER
Foo
<start>
baz
<stop>
Bar

$ cat foo.txt
Foo
<start>
<stop>
Bar

Various edits and commits do not change foo.txt outside my own modifications.

If I however remove foo.txt then check it out (git checkout) then I'll retrieve the filtered version.

$ rm foo.txt
$ git checkout foo.txt
$ cat foo.txt
Foo
<start>
baz
<stop>
Bar

Is there a way to modify worktree content as well or check it out automatically (maybe combining filter with a Git hook) ?


Solution

  • This is, for good or ill, the way a clean filter is meant to work. It will never alter the content in the working tree copy: it applies only to the stream of bytes that Git will see at its compressor.1

    As phd suggested in a comment, the reliable way to update the working tree copy is to do that separately. In theory, if you can get the name of the working tree object—which you can using the long-running protocol—you could first filter the file, then update it in place (perhaps asynchronously), but this introduces a lot of potential races, and as the documentation notes, the file may not even exist in that pathname at that point (this would not normally be the case for the clean filter but there might be some corner cases that are not obvious).

    (At this particular point, checking the file out of the index works fine even without a smudge filter since the index copy of the file contains the added line.)


    1The compressor does the zlib compression to make a new internal blob object; it also computes the hash ID, at the same time. The resulting hash ID is either the hash ID of some existing object (in which case the new blob is a duplicate that is discarded in favor of the existing one) or is a new object (in which case the new blob is added to the repository object database). The hash ID of the new blob is now valid and replaces the old hash ID in the index, if the file was in the index before, or becomes the hash ID of the new entry in the index, if the file is new to the index now.