There's several shell-specific ways to include a ‘unicode literal’ in a string. For instance, in Bash, the quoted string-expanding mechanism, $''
, allows us to directly embed an invisible character: $'\u2620'
.
However, if you're trying to write universally cross-platform shell-scripts (generally, this can be truncated to “runs in Bash, Zsh, and Dash.”), that's not a portable feature.
I can portably achieve anything in the ASCII table (octal number-space) with a construct like the following:
WHAT_A_CHARACTER="$(printf '\036')"
… however, POSIX / Dash printf
only supports octal escapes.
I can also obviously achieve the full Unicode space by farming the task out to a fuller programming environment:
OH_CAPTAIN_MY_CAPTAIN="$(ruby -e 'print "\u2388"')"
TAKE_ME_OUT_TONIGHT="$(node -e 'console.log("\u266C")')"
So: what's the best way to encode such a character into a shell-script, that:
dash
, bash
, and zsh
,If you have Gnu printf
installed (it's in debian package coreutils
, for example), then you can use it independent of which shell you are using by avoiding the shell's builtin:
env printf '\u2388\n'
Here I am using the Posix-standard env
command to avoid the use of the printf
builtin, but if you happen to know where printf
is you could do this directly by using the complete, path, such as
/usr/bin/printf '\u2388\n'
If both your external printf
and your shell's builtin printf
only implement the Posix standard, you need to work harder. One possibility is to use iconv
to translate to UTF-8, but while the Posix standard requires that there be an iconv
command, it does not in any way prescribe the way standard encodings are named. I think the following will work on most Posix-compatible platforms, but the number of subshells created might be sufficient to make it less efficient than a "heavy" script interpreter:
printf $(printf '\\%o' $(printf %08x 0x2388 | sed 's/../0x& /g')) |
iconv -f UTF-32BE -t UTF-8
The above uses the printf
builtin to force the hexadecimal codepoint value to be 8 hex digits long, then sed
to rewrite them as 4 hex constants, then printf
again to change the hex constants into octal notation and finally another printf
to interpret the octal character constants into a four-byte sequence which can be fed into iconv
as big-endian UTF-32. (It would be simpler with a printf
which recognizes \x
escape codes, but Posix doesn't require that and dash
doesn't implement it.)
You can use the line without modification to print more than one symbol, as long as you provide the Unicode codepoints (as integer constants) for all of them (example executed in dash
):
$ printf $(printf '\\%o' $(printf %08x 0x2388 0x266c 0xA |
> sed 's/../0x& /g')) |
> iconv -f UTF-32BE -t UTF-8
⎈♬
$
Note: As Geoff Nixon mentions in a comment, the fish shell (which is nowhere close to Posix standard, and as far as I can see has no aspirations to conform) will complain about the unquoted %08x
format argument to printf
, because it expects words starting with %
to be jobspecs. So if you use fish, add quotes to the format argument.