How can a text string be turned into UTF-8 encoded bytes using Bash and/or common Linux command line utilities? For example, in Python one would do:
"Six of one, ½ dozen of the other".encode('utf-8')
b'Six of one, \xc2\xbd dozen of the other'
Is there a way to do this in pure Bash:
STR="Six of one, ½ dozen of the other"
<utility_or_bash_command_here> --encoding='utf-8' $STR
'Six of one, \xc2\xbd dozen of the other'
Perl to the rescue!
echo "$STR" | perl -pe 's/([^x\0-\x7f])/"\\x" . sprintf "%x", ord $1/ge'
The /e
modifier allows to include code into the replacement part of the s///
substitution, which in this case converts ord to hex via sprintf.