There is a large Git repository (~4,000 commits) containing files in CP866
and does not contain a file named .gitattributes
in the root of project. Is there any way to add .gitattributes (*.txt text working-tree-encoding=CP866)
with rewriting everything as if it existed from the beginning?
Tried git rebase -i --root
and got conflicts on every commit after adding .gitattributes
in root.
git rebase
certainly should give conflicts on every commit due to different encoding. I couldn't find a simple way to use git filter-repo
: its --blob-callback
knows the data but doesn't know the file name which we should match against the mask *.txt
; --commit-callback
knows the files but only provides blob IDs so the content must be extracted and written separately.
So the following solution uses git filter-branch
. I use --index-filter
, it's much faster than --tree-filter
(on 4000 commits it still will be slow, alas) and we have all the information in the index — file names and content. What the code does: first it creates new .gitattributes
, then it runs a loop over all *.txt
files recursively, convert them from CP866 to UTF-8 and updates the index. At the end it forces checkout of the converted files in the proper encoding. I took the main part of code from the answer, thanks @jthill! Found in https://stackoverflow.com/search?q=%5Bgit-filter-branch%5D+file+content
Before running any code please make a backup or run the code in a temporary copy of your repository!
Here is the code:
#! /bin/sh
set -e
FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch --index-filter '
set -e
f=.gitattributes
updated=$(
echo "*.txt working-tree-encoding=cp866" |
git hash-object -w --stdin --path=$f
)
git update-index --add --cacheinfo 100644,$updated,$f
for f in $(git ls-files "*.txt"); do
updated=$(
git cat-file blob ":$f" | iconv -f cp866 -t utf-8 |
git hash-object -w --stdin --path="$f"
)
git update-index --add --cacheinfo 100644,$updated,"$f"
done
' HEAD
# Checkout the files in the proper encoding
find . -name "*.txt" -delete
git restore "*.txt"
I tested it on my repository https://github.com/phdru/m_librarian ; only converted README.rus.txt
from KOI8-R to UTF-8. The real code I used is:
#! /bin/sh
set -e
cd m_librarian
FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch --index-filter '
set -e
f=.gitattributes
updated=$(
git cat-file blob :$f |
sed "s!/README.rus.txt encoding=utf-8!/README.rus.txt working-tree-encoding=koi8-r!" |
git hash-object -w --stdin --path=$f
)
git update-index --add --cacheinfo 100644,$updated,$f #&&
f=README.rus.txt
if ! git cat-file blob :$f | iconv -f utf-8 -t koi8-r >/dev/null 2>&1; then
updated=$(
git cat-file blob :$f | iconv -f koi8-r -t utf-8 |
git hash-object -w --stdin --path=$f
)
git update-index --add --cacheinfo 100644,$updated,$f
fi
' b4c32de..master
# Checkout the file in the proper encoding
rm README.rus.txt
git restore README.rus.txt