Search code examples
webservermojibakefile-encodingsgithub-pages

set file encoding attributes of *.txt file on webserver


gh-pages is mobibaking a text file . . . Our CI build server copies some build artifacts to gh-pages.

Locally, I can see that the file-encoding is UTF-8, and if I download the file, and open it, it renders just fine in a text-editor.

However, in Safari, Firefox and Chrome, the special characters (tick, checkmark, etc) are getting mojibaked. How can instruct to use the correct file encoding?


Solution

  • Without being instructed, there's no way that the browser can know what file encoding to use for a plain-text file. Setting .htaccess can help, however is web-server dependent. A more portable way is to ensure the text-file starts with a UTF8 byte order mark (BOM). One way to do this is as follows:

    #!/bin/sh
    
    if [ $# -eq 0 ];
    then
            echo usage $0 files ...
            exit 1
    fi
    
    for file in $*;
    do
            echo "# Processing: $file" 1>&2
            if [ ! -f "$file" ];
            then
                    echo Not a file: "$file" 1>&2
                    exit 1
            fi
            TYPE=`file - < "$file" | cut -d: -f2`
            if echo "$TYPE" | grep -q '(with BOM)';
            then
                    echo "# $file already has BOM, skipping." 1>&2
            else
                    ( mv ${file} ${file}~ && uconv -f utf-8 -t utf-8 --add-signature < "${file}~" > "${file}" ) || ( echo Error processing "$file" 1>&2 ; exit 1)
            fi
    done