Search code examples
gitgit-checkoutbackslashinvalid-charactersgit-history

Remove all files from Git repo history with path having escape \ in filename with git filter-repo


I have special filenames with escape \ characters stored in Git repository on Debian 10 Linux.

Problem: it is not possible to git checkout files on Windows, which have incompatible characters in the filename.

Example:

git log --all --name-only -m --pretty= '*\\*'
"systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
"systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
"systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"

I get following Git errors at Windows checkout:

C:\Git\bin\git.exe reset --hard "5ef1cac3a03304c35b455edf32bd1bb78060c5b9" --
error: invalid path 'systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount'
fatal: Could not reset index file to revision '5ef1cac3a03304c35b455edf32bd1bb78060c5b9'.
Done

Problem reproducing steps:

# Clone repository, to be executed on a safe repo:
git clone --no-local /source/repo/path/ /target/path/to/repo/clone/
# Cloning into '/target/path/to/repo/clone'...
# remote: Enumerating objects: 9534, done.
# remote: Counting objects: 100% (9534/9534), done.
# remote: Compressing objects: 100% (4776/4776), done.
# remote: Total 9534 (delta 4215), reused 8043 (delta 3136), pack-reused 0
# Receiving objects: 100% (9534/9534), 7.41 MiB | 16.78 MiB/s, done.
# Resolving deltas: 100% (4215/4215), done.

cd /target/path/to/repo/clone/

# List the files with escape \ from repo history into a list file:
git log --all --name-only -m --pretty= '*\\*' | sort -u >/opt/git_repo_files_w_escape.txt

# Remove the files with escape \ from repo history:
git filter-repo --invert-paths --paths-from-file /opt/git_repo_files_w_escape.txt
Parsed 592 commits
New history written in 0.25 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 71128f3 .gitignore: ADD snap-git to be ignored
Enumerating objects: 9354, done.
Counting objects: 100% (9354/9354), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3694/3694), done.
Writing objects: 100% (9354/9354), done.
Total 9354 (delta 4085), reused 9354 (delta 4085), pack-reused 0
Completely finished after 0.55 seconds.


# List files with escape \ to check result:
git log --format="reference" --name-status --diff-filter=A '*\\*'
# "systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
# "systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
# "systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"

#  Unfortunately it seems filter-repo was executed, but log still lists filenames with escape \ :-( 

Question:

1) How to remove all files from Git repo history with path having at least one escape \ character in filename?

(reason: it is not possible to checkout those files on Windows, which have incompatible characters in the filename)

UPDATE1:

Tried to replace \\x2d string to - in input file list as suggested, but git history remove was still unsuccessful:

# List the files with escape \ from repo history into a list file:
git log --all --name-only -m --pretty= '*\\*' | sort -u >/opt/git_repo_files_w_escape.txt

# Replace \\x2d string to - in git_repo_files_w_escape.txt:
sed -i 's/\\\\x2d/-/g' /opt/git_repo_files_w_escape.txt

# Remove the listed files from repo history:
git filter-repo --invert-paths --paths-from-file /opt/git_repo_files_w_escape.txt
Parsed 592 commits
New history written in 0.25 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 71128f3 .gitignore: ADD snap-git to be ignored
Enumerating objects: 9354, done.
Counting objects: 100% (9354/9354), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3694/3694), done.
Writing objects: 100% (9354/9354), done.
Total 9354 (delta 4085), reused 9354 (delta 4085), pack-reused 0
Completely finished after 0.55 seconds.


# List files with escape \ to check result:
git log --format="reference" --name-status --diff-filter=A '*\\*'
# "systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
# "systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
# "systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"

#  Unfortunately log still lists filenames with \\x2d :-(

UPDATE2:

Tried to replace \\x2d in git_repo_files_w_escape.txt to \\\\x2d or \x2d but none of them resulted to remove the files having \\x2d in filename from Git history.

UPDATE3:

I'm looking for a working solution based on git filter-repo.

Any more idea?


Solution

  • You fed bad input into filter-repo, based on a common but incorrect assumption about how git log works.

    Look at your own output:

    $ git log --format="reference" --name-status --diff-filter=A '*\\*'
    "systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
    "systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
    "systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"
    

    Let's look at the first line as an example. If you were to store that in a file, which you pass to --paths-from-file, then git-filter-repo is going to be looking for a file named "systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount" to remove. You have no such file in your repository. Instead you have one named systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount. (Note that I have removed both " characters and two of the \ characters.)

    The problem here is that you assumed git log would list filenames as-is, which it won't do whenever there are special characters. You can often get around this by setting core.quotepath=false (this particularly helps when you have non-ascii characters), but even that is ignored when you have backslashes.

    Here's something that might work better for you for generating the list of filenames to exclude:

    git log -z --all --name-only -m --pretty= '*\\*' | tr '\0' '\n' | sort -u >/opt/git_repo_files_w_escape.txt
    

    but it assumes you do not have filenames with newline characters. (If you do have files with newline characters, though, then --paths-from-file won't work for you.)

    Even simpler would be bypassing creating a list of files with bad names and just programatically removing them by pattern:

    git filter-repo --filename-callback 'return None if b'\\' in filename else filename'