Search code examples
regexperlprintf

Convert Json string literals to utf8 characters with perl or bash


I have a file full of \u codes and want to replace them all with corresponding utf8 character, for example "\u00FC" will become "ü":

Here is how far I got:

echo 'f\u00FCr' | perl -C -p -e "s/\\\\(u[0-9A-Fa-f]{4})/ chr(hex(sprintf('0x%s', '00FC'))) /ge"

This will output the expected "für". I just can't figure out how to use the value of the capture group into the sprintf function? $1, $1, \1 and \1 are not working. Guess it will be something very simple, but don't know what to search for. :-)

Or if there is a better approach for this, please let me know, too!


Solution

  • $1 is correct, although you are mistakenly including the u in the capture.

    But you have to be careful about escaping for the shell. You are apparently using sh or similar (based on your need to escape the \), so you have to escape certain characters when using double-quotes. That includes $. Your shell is interpolating $1 before perl sees it. Best to use single-quotes.

    perl -C -pe's/\\u([0-9A-Fa-f]{4})/ chr(hex($1)) /ge'
    

    Note that sprintf('0x%s', '00FC') is equivalent to '0x' . '00FC', but hex doesn't require the leading 0x. '00FC' (and thus $1) is sufficient.