Search code examples
bashfindiconv

Bash - export txt with special characters


I'm trying to generate a txt with all folders that are not empty.

The problem is that the name of these folders have "special characters", so instead of listing "Começo" is saving "Começo" (as an example)

I've read about iconv, but from what I read this is a "conversor", and I don't want to "convert" files, I want to save them in the right form without converting after.

 find /SubFolder/* -type d -not -empty  -exec bash -c 'echo ${0#/Folder/}'  {} \; > /Folder/NotEmpty.txt

Solution

  • There should be no problem. A filename in Linux is just an array of bytes, they are not interpretated as text (ie decoded) unless necessary. And in your case, they aren't.

    Eg

    [test@localhost t]$ ls
    Começo  xx
    [test@localhost t]$find . -type d
    .
    ./Começo
    ./xx
    [test@localhost t]$ find . -type d -exec bash -c 'echo ${0#/Folder/}'  {} \;
    .
    ./Começo
    ./xx
    [test@localhost t]$ find . -type d -exec bash -c 'echo ${0#/Folder/}'  {} \; > list.txt
    [test@localhost t]$ cat list.txt
    .
    ./Começo
    ./xx
    [test@localhost t]$ od -c list.txt
    0000000   .  \n   .   /   C   o   m   e 303 247   o  \n   .   /   x   x
    0000020  \n
    0000021
    

    We can deduce, from od output, that, because my Linux session had a UTF-8 locale encoding, the filename was internally represented with UTF-8 encoding (7 bytes).

    It's important to understand that commands as ls and find just spits that sequence of bytes without "decoding" them as text, that's just the job of the console (which in my case in in UTF-8, so I see them OK). The same can be said about the generated file list.txt, inside it we have just the raw bytes corresponding to filenames. And, again, I see them OK when I cat it, because (and only because) my console has the proper encoding (UTF-8).

    Only if I attempt to see the file in other environent, a console with other locale or some text viewer-editor that attemps to read it as ISO-8859-1 or other encoding, I'll see the "strange characters"

    [test@localhost t]$ cat list.txt
    .
    ./Começo
    ./xx
    

    (After setting my console encoding to ISO-8859-1 - in my case, Konsole->Settings->Edit profile -> Advanced -> Encoding)