Consider the following example line of text:
α Arietis, called Hamal, is the brightest star in Aries. Its traditional name is derived from the Arabic word for “lamb” or “head of the ram” (ras al-hamal).
It has three different UTF-8 characters, the α
, a left smart quote, and a right smart quote.
My goal is to transliterate as much as possible from UTF-8 to regular ASCII, but leave any non-convertible characters as-is. (In the above sample text, the smart quotes can be transliterated to "
, but the α
cannot.)
My current command is:
iconv -f UTF-8 -t ASCII//TRANSLIT < iconv.sample
However, it fails to convert the α
and terminates with iconv: (stdin):1:0: cannot convert
.
If I add //IGNORE
to the target or use the -c
option, it drops the α
altogether.
How can I transliterate if possible, but fallback to the original input character if not?
I'm not sure it's possible when using iconv
, as the output encoding will have to be conformed to (that is, if you specify ASCII
, it's only going to spit out ASCII
, no matter what).
If you have uconv available, you can specify transliteration operations away from output encoding:
uconv -f "UTF-8" -t "UTF-8" -x "Latin-ASCII"
As an example:
$ echo "α Arietis “head of the ram”" | uconv -f "UTF-8" -t "UTF-8" -x "Latin-ASCII"
α Arietis "head of the ram"