I want to remove one ASCII character and then I want replace it with non-ASCII. My code is :
sed -e 's/[\d100\d130]/g'
To explain: I want to replace "100" (in ASCII ,decimal ) with "135" (in ASCII, decimal.) In short, I want to replace 2 letters and one of them will remove. This code is valid?
Decimal 100 is a "d", and 135 is an extended ascii "ç" or cedilla.
Setting a to all values:
a="$(printf "$(printf '\\x%x' {95..105} 135 135 135 {130..140} )")"
Both this work:
echo "$a"| tr '\144' '\207'
echo "$a"| sed -e $'s/\144/\207/g' # Note the $
If you want to see this characters, write to a file, and open it with encoding IBM850. In an text editor with that capacity you will see (three times a cedilla ç, and the d changed as well):
_`abcçefghiçççéâäàåçêëèïî
For utf-8, things are diferent.
The cedilla in UTF-8 is decimal 231 (hex E7), and it is output with this:
$ printf $'\U0E7'
ç
To get the UTF-8 of values above 127 (7F) and up to 255 (FF) may get tricky because Bash misinterprets some values. This function will allow the conversion from a value to the correct character:
function chr_utf8 {
local val
[[ ${2?Missing Ordinal Value} -lt 0x80000000 ]] || return 1
if [[ ${2} -lt 0x100 && ${2} -ge 0x80 ]]; then
# bash 4.2 incorrectly encodes
# \U000000ff as \xff so encode manually
printf -v val "\\%03o\%03o" $(( (${2}>>6)|0xc0 )) $(( (${2}&0x3f)|0x80 ))
else
printf -v val '\\U%08x' "${2}"
fi
printf -v ${1?Missing Dest Variable} ${val}
}
chr_utf8 a 231
echo "$a"
The solution was actually very simple:
echo "aadddcc" | sed $'s/d/\U0E7/g' # echo $'\U0E7' should output ç
aaçççcc
Test that you get a ç from echo $'\U0E7'
, if not, you need the function above.