Search code examples
asciinon-ascii-charactersextended-ascii

remove ascii character and replace with non-ascii


I want to remove one ASCII character and then I want replace it with non-ASCII. My code is :

sed -e 's/[\d100\d130]/g' 

To explain: I want to replace "100" (in ASCII ,decimal ) with "135" (in ASCII, decimal.) In short, I want to replace 2 letters and one of them will remove. This code is valid?


Solution

  • Decimal 100 is a "d", and 135 is an extended ascii "ç" or cedilla.
    Setting a to all values:

    a="$(printf "$(printf '\\x%x' {95..105} 135 135 135 {130..140} )")"
    

    Both this work:

    echo "$a"| tr '\144' '\207'
    echo "$a"| sed -e $'s/\144/\207/g'    # Note the $
    

    If you want to see this characters, write to a file, and open it with encoding IBM850. In an text editor with that capacity you will see (three times a cedilla ç, and the d changed as well):

    _`abcçefghiçççéâäàåçêëèïî
    

    UTF-8

    For utf-8, things are diferent.
    The cedilla in UTF-8 is decimal 231 (hex E7), and it is output with this:

    $ printf $'\U0E7'
    ç
    

    To get the UTF-8 of values above 127 (7F) and up to 255 (FF) may get tricky because Bash misinterprets some values. This function will allow the conversion from a value to the correct character:

    function chr_utf8 {
        local val
        [[ ${2?Missing Ordinal Value} -lt 0x80000000 ]] || return 1
    
        if [[ ${2} -lt 0x100 && ${2} -ge 0x80 ]]; then
    
            # bash 4.2 incorrectly encodes
            # \U000000ff as \xff so encode manually
            printf -v val "\\%03o\%03o" $(( (${2}>>6)|0xc0 )) $(( (${2}&0x3f)|0x80 ))
        else
            printf -v val '\\U%08x' "${2}"
        fi
        printf -v ${1?Missing Dest Variable} ${val}
    }
    
    chr_utf8 a 231
    echo "$a"
    

    Conclusion

    The solution was actually very simple:

    echo "aadddcc" | sed $'s/d/\U0E7/g'       # echo $'\U0E7' should output ç
    aaçççcc
    

    Test that you get a ç from echo $'\U0E7', if not, you need the function above.