Search code examples
unicodeutf-8character-encodingarabiciconv

Converting only non utf-8 files to utf-8


I have a set of md files, some of them are utf-8 encoded, and others are not (windows-1256 actually).

I want to convert only non-utf-8 files to utf-8.

The following script can partly do the job:

for file in *.md;
do
    iconv -f windows-1256 -t utf-8 "$file" -o "${file%.md}.🆕.md";
done

I still need to exclude the original utf-8 files from this process, (maybe using file command?). Try the following command to understand what I mean:

file --mime-encoding *

Notice that although file command isn't smart enough to detect the right character set of non-utf-8 files, It's enough in this case that it can distinguish between utf-8 and non-utf-8 files.

Thanks in advance for help.


Solution

  • You can use for example an if statement:

    if file --mime-encoding "$file" | grep -v -q utf-8 ; then
        iconv -f windows-1256 -t utf-8 "$file" -o "${file%.md}.🆕.md";
    fi
    

    If grep doesn't find a match, it returns a status code indicating failure. The if statement tests the status code