Search code examples
linuxencodingterminalconsolefilenames

File name look the same but is different after copying


My file names look the same but they are not.

I copied many_img/ from Debian1 to OS X, then from OS X to Debian2 (for maintenance purpose) with using rsync -a -e ssh on each step to preserve everything.

If i do ls many_img/img1/* i get visually the same output on Debian1 and Debian2 :

prévisionnel.jpg

But somehow, ls many_img/img1/* | od -c gives different results:

On Debian1:

0000000   p   r 303 251   v   i   s   i   o   n   n   e   l  .   j   p
0000020   g  \n

On Debian2:

0000000   p   r   e 314 201   v   i   s   i   o   n   n   e   l  .   j
0000020   p   g  \n

Thus my web app on Debian2 cannot match the picture in the file system with filename in database.

i thought maybe i need to change file encoding, but it looks like it's already utf-8 on every OS:

convmv --notest -f iso-8859-15 -t utf8 many_img/img1/* 

Returns:

Skipping, already UTF-8

Is there a command to get back all my 40 thousands file names like on my Debian 1 from my Debian 2 (without transfering all again) ? I am confused if it is a file name encoding problem or anything else ?


Solution

  • I finaly found command line conversion tools i was looking for (thanks @Mark for setting me on the right track !)

    Ok, i didn't know OS X was encoding file names under the hood with a different UTF-8 Normalization.

    • It appears OS X is using Unicode Normalization Form D (NFD)
    • while Linux OS are using Unicode Normalization Form C (NFC)

    HSF+ file system encode every single file name character in UTF-16. Unicode characters are Decomposed on OS X versus Precomposed on Linux OS.

    é for instance (Latin small letter e with acute accent), is technically a (U+00E9) character on Linux and is decomposed into a base letter "e" (U+0065) and an acute accent (U+0301) in its decomposed form (NFD) on OS X.

    Now about conversion tools:

    1. This command executed from Linux OS will convert file name from NFD to NFC:

      convmv --notest --nfc -f utf8 -t utf8 /path/to/my/file

    2. This command executed from OS X will rsync over ssh with NFD to NDC on the fly conversion:

      rsync -a --iconv=utf-8-mac,utf-8 -e ssh path/to/my/local/directory/* user@destinationip:/remote/path/

    I tested the two methods and it works like a charm.

    Note:

    --iconv option is only available with rsync V3 whereas OS X provides an old 2.6.9 version by default so you'll need to update it first.

    Typically to check and upgrade :

    rsync --version
    brew install rsync
    echo 'export PATH=/usr/local/bin:$PATH' >> ~/.profile