Search code examples
regexsedcharacter-encoding

How to convert Hex characters to ASCII using sed & regex


I've a file with several special characters coded in hexa (the other words are readable). I would like to use sed to convert them using \xHH but I'm not able to do it using regex to match hexa values to translate.

If I manually force the Hexa value it works:

[user@Centos7]$ echo "aaaíaaa" | sed -r 's/&#x([[:xdigit:]]+);/\xED/g'
aaaíaaa

But if I try to reuse the match from my regex to translate it to ACSII value using \xHH, it failed => the result is \x + the value matched

[user@Centos7]$ echo "aaaíaaa" | sed -r 's/&#x([[:xdigit:]]+);/\x\1/g'
aaaxEDaaa

Any clue to help me for this issue? Thanks


Solution

  • You can achieve that with perl using MHTML::Entities:

    echo 'aaaíaaa' | perl -MHTML::Entities -CS -pe '$_ = decode_entities($_)'
    

    See the online demo.

    Here,

    • Due to -CS Perl allows UTF-8 characters in the STDOUT
    • decode_entities($string) routine replaces HTML entities found in the $string with the corresponding Unicode character (nrecognized entities are left as is).