ascii non-ascii-characters extended-ascii

remove ascii character and replace with non-ascii

I want to remove one ASCII character and then I want replace it with non-ASCII. My code is :

sed -e 's/[\d100\d130]/g'

To explain: I want to replace "100" (in ASCII ,decimal ) with "135" (in ASCII, decimal.) In short, I want to replace 2 letters and one of them will remove. This code is valid?

Solution

Decimal 100 is a "d", and 135 is an extended ascii "ç" or cedilla.
Setting a to all values:

a="$(printf "$(printf '\\x%x' {95..105} 135 135 135 {130..140} )")"

Both this work:

echo "$a"| tr '\144' '\207'
echo "$a"| sed -e $'s/\144/\207/g'    # Note the $

If you want to see this characters, write to a file, and open it with encoding IBM850. In an text editor with that capacity you will see (three times a cedilla ç, and the d changed as well):

_`abcçefghiçççéâäàåçêëèïî

UTF-8

For utf-8, things are diferent.
The cedilla in UTF-8 is decimal 231 (hex E7), and it is output with this:

$ printf $'\U0E7'
ç

To get the UTF-8 of values above 127 (7F) and up to 255 (FF) may get tricky because Bash misinterprets some values. This function will allow the conversion from a value to the correct character:

function chr_utf8 {
    local val
    [[ ${2?Missing Ordinal Value} -lt 0x80000000 ]] || return 1

    if [[ ${2} -lt 0x100 && ${2} -ge 0x80 ]]; then

        # bash 4.2 incorrectly encodes
        # \U000000ff as \xff so encode manually
        printf -v val "\\%03o\%03o" $(( (${2}>>6)|0xc0 )) $(( (${2}&0x3f)|0x80 ))
    else
        printf -v val '\\U%08x' "${2}"
    fi
    printf -v ${1?Missing Dest Variable} ${val}
}

chr_utf8 a 231
echo "$a"

Conclusion

The solution was actually very simple:

echo "aadddcc" | sed $'s/d/\U0E7/g'       # echo $'\U0E7' should output ç
aaçççcc

Test that you get a ç from echo $'\U0E7', if not, you need the function above.