I try a migration from a Mercurial repository to Git on Windows 11 in the following way in Git Bash:
MINGW64$ ls
hg-repo/ git-repo/
MINGW64$ cd git-repo
MINGW64$ git init
MINGW64$ ~/fast-export/hg-fast-export.sh -r ../hg-repo/ --force -A ../hg-repo/authors.txt -M main
The migration succeeds and the following is needed
MINGW64$ git checkout main
which should result in a repository with no changes. But instead I get something as the following:
MINGW64$ git status
On branch main
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
deleted: Folder1/grünes-Ding.png
Untracked files:
(use "git add <file>..." to include in what will be committed)
Änderungen/
Folder1/grünes-Ding.png
So it looks like "Folder1/grünes-Ding.png" was deleted and then added again. If I try to restore the folder I get the following.
MINGW64$ git restore Folder1/grünes-Ding.png
error: pathspec 'Folder1/grünes-Ding.png' did not match any file(s) known to git
I think in this case Git does not understand "Folder1/grünes-Ding.png" because ü is represented in another way in Git as I see it in git-bash. "Änderungen/" should be also in the repository. Because if I delete it in the working directory, it appears with all its files as "deleted" changes. If I then try to restore these files I get the same error type. The files in this folder does not contain umlauts.
My question is: How can I tell Git to handle folders and files with Umlauts?
The only thing I found so far regarding umlauts was showing them correctly in logs or commit messages. But this is not the problem here.
My config of Git looks like this:
MINGW64$ git config -l
diff.astextplain.textconv=astextplain
http.sslbackend=openssl
http.sslcainfo=C:/Program Files/Git/mingw64/ssl/certs/ca-bundle.crt
core.autocrlf=input
core.fscache=true
core.symlinks=false
pull.rebase=false
init.defaultbranch=main
difftool.sourcetree.cmd=''
mergetool.sourcetree.cmd=''
mergetool.sourcetree.trustexitcode=true
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true
core.symlinks=false
core.ignorecase=true
core.quotepath=false
core.fsmonitor=true
i18n.logoutputencoding=UTF-8
MINGW64$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=
I played a little bit around with the options of hg-fast-export and found a solution, eventually.
hg-fast-export has two options handling the encoding: -e
and --fe
. -e
defines the encoding of the commit messages and author names etc. in Mercurial to convert it to UTF-8 and --fe
defines the encoding of the filenames.
I tried different encodings for the filenames and found that latin1
worked for me. But first, I made the mistake to use -fe
instead of --fe
. But -fe
results in -f
and -e
and not --fe
. So be aware of this! If you use -e
, also the option --fe
is automatically set to the value of -e
which then results in wrong encoding of commit messages.
Finally, the migration works like this
MINGW64$ ls
hg-repo/ git-repo/
MINGW64$ cd git-repo
MINGW64$ git init
MINGW64$ ~/fast-export/hg-fast-export.sh -r ../hg-repo/ --force -A ../hg-repo/authors.txt -M main --fe latin1