Search code examples
unixibm-midrange

how an as/400 recognizes the encoding of a file? cat prints correctly two different encodings


In QSHELL, I do cat of two files with the same text in different encodings, and both print the same content, look (hell.txt is EBCDIC and hellascii.txt is ASCII):

cat hellascii.txt
hello @@@@@@@

cat hell.txt
hello @@@@@@@

od -x hell.txt
0000000 4040 4040 8885 9393 9640 7c7c 7c7c 7c7c
0000020 7c25
0000022

od -x hellascii.txt
0000000 2020 2020 6865 6c6c 6f20 4040 4040 4040
0000020 4000
0000021

In my laptob, in linux or mac, EBCDIC encoding shows other characters that look messed up. How the unix in as400 can print both correctly? I do not see anything such as a file header that indicates the encoding. For example, 0x40 is @ in ascii and space in EBCDIC, but cat prints correctly in hell.txt the 0x40 as space and in hellascii.txt as @.


Solution

  • On IBM i, the system keeps track of the CCSID assigned to each IFS file. You can see the CCSID by using the -C option of od. Here is an example.

    $ od -tx -C helloascii.txt
    helloascii.txt CCSID = 819
    0000000  68692074 68657265 0a000000
    0000011
    $ od -tx -C helloebcdic.txt
    helloebcdic.txt CCSID = 256
    0000000  888940a3 88859985 25000000
    0000011
    

    You can assign the CCSID of a new file using the -C option of touch. Here is how I created the files used above.

    $ touch -C 819 helloascii.txt
    $ echo 'hi there' >> helloascii.txt
    $ touch -C 256 helloebcdic.txt
    $ echo 'hi there' >> helloebcdic.txt