How do I define the record structure of ebcdic file?

I have ebcdic file in hdfs I want to load data to spark dataframe, process it and load results as orc files, I found that there is a open source solution which is cobrix cobrix, that allow to get data from ebcdic files, but developer must provide a copybook file which is a schema definition.

A few line of my ebcedic file are presented in the attached image. I want to get the format of copybook of the ebcdic file, essentially I want to read the vin his length is 17, vin_data the length is 3 and finally vin_val the length is 100.

enter image description here


  • Based on your comment in the question, and looking at the input file, you could start with this.

    01  VIN-RECORD.
        05  VIN                 PIC X(17).
        05  VIN-COUNT           PIC S9(5) COMP-3.
        05  VIN-VALUE           PIC X(100).

    I'm guessing that the second field is COMP-3 based on the six examples all ending with a C byte. This indicates a positive COMP-3 value. A D byte would be a negative COMP-3 value. An F byte would indicate an unsigned COMP-3 value.

    The third field is variable length and right padded with spaces.