Search code examples
linuxbashutf-8iconvebcdic

Conversion from EBCDIC to UTF8 in Linux


I have imported with Perl a table from our database AS/400 DB2.

The problem is that the string are encoded in EBCDIC Latin-1 (italian language).

How can I convert the resulting file to plain utf-8 in Linux bash?


Solution

  • It's simple with iconv.

    iconv -f ISO8859-1   -t "UTF-8" result.csv -o new_result.csv
    

    ISO8859-1 is the Latin-1 encoding format. For a list of encodings, refer t this table from official IBM documentation: https://www.ibm.com/support/knowledgecenter/ssw_aix_53/com.ibm.aix.nls/doc/nlsgdrf/iconv.htm%23d722e3a267mela

    Note that the conversion may leave non valid UTF-8 characters from EBCDIC. An example are NULL characters in the strings. To avoid this, use an HEX editor and replace hex values from 00 to 20 (space character).