Search code examples
c#coboldata-conversion

Read COBOL COMP data in C#


I'm new to COBOL and I have been trying to read record information from a text file that is an output from the table.

Most non-comp data-types i'm okay with, it's the 'COMP' ones i'm getting stuck on.

I've been trying to figure this out all day today reading as much as I can on this.

The date fields below are the ones I can't convert to a date-string:

05 VALDATE          PIC 9(6) COMP
05 PAYDATE          PIC 9(6) COMP
05 SYSDATE          PIC 9(6) COMP

From my understanding all those types above are going to be 4 bytes each in the file.

They are supposed to be dates that should represent YYMMDD, but the data just doesn't seem to be as small as that. I've looked into EBCDIC and reversing the byte[] data and using BitConverter.ToUNIT32() and changing the Encoding used to read the file with no luck.

I read that dates that are computed into an integer are stored as the number of days from Jan 1st 1601, hence why code below is trying to add the value to 1601. (http://www.techtricky.com/cobol-date-functions-list-add-find-duration/)

My issue is that the either the data from the text file just isn't right or i'm missing a step to get what should be a date similar to YYMMDD.

The data for the 3 above are as follows:

[ 32] [237] [ 44] [  4] | 00100000 11101101 00101100 00000100
[ 33] [ 14] [ 32] [237] | 00100001 00001110 00100000 11101101
[131] [ 48] [ 48] [ 48] | 10000011 00110000 00110000 00110000

And how i'm opening the file, I've changed the encoding to ascii with no luck:

 using (BinaryReader reader = new BinaryReader(File.Open(nFilePath, FileMode.Open), Encoding.Default))

Code used to try and read the COMP fields:

  public class DateFromUIntExtractor : LineExtractor
  {
    public DateFromUIntExtractor() : base(4)
    {
    }

    public override string ExtractText(BinaryReader nReader)
    {
      // e.g 32,237,44,44, included but commented out things i've tried
      byte[] data = nReader.ReadBytes(Length); // Length = 4

      //Array.Reverse(data); - Makes num = 552414212
      //data = ConvertAsciiToEbcdic(data);

      int num = BitConverter.ToUInt32(data, 0);
      // in this example num = 70053152

      DateTime date = new DateTime(1601,1,1);
      date = date.AddDays(num); // Error : num is too big

      Extract = date.ToString("yyyyMMdd");
      return Extract;
    }
  }

Is the data malformed? Or am i missing something?

UPDATE

The task i'm trying to accomplish is to replicate a COBOL program that transforms the data from one definition into another but in CSV format, as the program outputs a .dat file.

Source

My inexperienced interpretation of the source definition is that the data in the text file is either a PUA-ICGROUP or PUA-PUGROUP. Looking at the COBOL program it chooses PUA-ICGROUP when PUA-HEADER>PUA-KEY>PUA-RTYPE = "03", everything else is PUA-PUGROUP.

C-WRITE-START.
    IF  PUA-RTYPE = 3 THEN
        PERFORM C-WRITE-A
    ELSE
        PERFORM C-WRITE-B
    END-IF.

C-WRITE-EXIT.
    EXIT.

Definition

01  DLRPUARC.
    03  PUA-HEADER.
        05  PUA-KEY.
            07  PUA-CDELIM         PIC 99.
            07  PUA-SUPNO          PIC 9(7).
            07  PUA-RTYPE          PIC 99.
            07  PUA-REF            PIC 9(9).
            07  PUA-SEQ            PIC 999.
        05  PUA-ALTKEY.
            07  PUA-ACDELIM        PIC 99.
            07  PUA-ASUPNO         PIC 9(7).
            07  PUA-ATRNDATE       PIC 9(6).
            07  PUA-ARTYPE         PIC 99.
            07  PUA-AREF           PIC 9(9).
            07  PUA-ASEQ           PIC 999.
        05  FILLER                 PIC X(82).

    03  PUA-ICGROUP REDEFINES PUA-HEADER.
        05  FILLER                 PIC X(52).
        05  PUA-ICEXTREF           PIC X(10).
        05  PUA-ICORDNO            PIC 9(11).
        05  PUA-ICVALDATE          PIC 9(6) COMP.
        05  PUA-ICPAYDATE          PIC 9(6) COMP.
        05  PUA-ICSYSDATE          PIC 9(6) COMP.
        05  PUA-ICTRNVAL           PIC S9(9).
        05  PUA-ICCLRREF           PIC 9(6).
        05  PUA-ICDELDATE          PIC 9(6) COMP.
        05  PUA-ICOTHQRY           PIC X.
        05  PUA-ICPRCQRY           PIC X.
        05  PUA-ICMRSQRY           PIC X.
        05  PUA-ICDSCTYPE          PIC 9.
        05  PUA-ICDSCVAL           PIC S9(9) COMP.
        05  PUA-ICVATCODE          PIC 9.
        05  PUA-ICVATAMT           PIC S9(8) COMP.
        05  PUA-ICTAXAMT           PIC S9(8) COMP.
        05  PUA-ICMRSREF           PIC 9(6).
        05  PUA-ICSUBDIV           PIC 9.
        05  PUA-ICCOSTCTR          PIC X(5).
        05  PUA-ICSEQIND           PIC X.
        05  FILLER                 PIC X(4).

    03  PUA-PUGROUP REDEFINES PUA-HEADER.
        05  FILLER                 PIC X(52).
        05  PUA-PUEXTREF           PIC X(10).
        05  PUA-PUORDNO            PIC 9(11).
        05  PUA-PUVALDATE          PIC 9(6) COMP.
        05  FILLER                 PIC XXX.
        05  PUA-PUSYSDATE          PIC 9(6) COMP.
        05  PUA-PUTRNVAL           PIC S9(9).
        05  PUA-PUCLRREF           PIC 9(6).
        05  PUA-PUDELDATE          PIC 9(6) COMP.
        05  PUA-PUOTHQRY           PIC X.
        05  PUA-PUSUBDIV           PIC 9.
        05  FILLER                 PIC X(32).

Output Definition

01  OUT-A-REC.
    03  OUT-A-PUA-CDELIM             PIC 99.
    03  OUT-A-PUA-SUPNO              PIC 9(7).
    03  OUT-A-PUA-RTYPE              PIC 99.
    03  OUT-A-PUA-REF                PIC 9(9).
    03  OUT-A-PUA-SEQ                PIC 999.
    03  OUT-A-PUA-ATRNDATE           PIC 9(8).
    03  OUT-A-PUA-ICEXTREF           PIC X(10).
    03  OUT-A-PUA-ICORDNO            PIC 9(11).
    03  OUT-A-PUA-ICVALDATE          PIC 9(8).
    03  OUT-A-PUA-ICPAYDATE          PIC 9(8).
    03  OUT-A-PUA-ICSYSDATE          PIC 9(8).
    03  OUT-A-PUA-ICTRNVAL           PIC S9(9) SIGN LEADING SEPARATE.
    03  OUT-A-PUA-ICCLRREF           PIC 9(6).
    03  OUT-A-PUA-ICDELDATE          PIC 9(8).
    03  OUT-A-PUA-ICOTHQRY           PIC X.
    03  OUT-A-PUA-ICPRCQRY           PIC X.
    03  OUT-A-PUA-ICMRSQRY           PIC X.
    03  OUT-A-PUA-ICDSCTYPE          PIC 9.
    03  OUT-A-PUA-ICDSCVAL           PIC S9(9) SIGN LEADING SEPARATE.
    03  OUT-A-PUA-ICVATCODE          PIC 9.
    03  OUT-A-PUA-ICVATAMT           PIC S9(8) SIGN LEADING SEPARATE.
    03  OUT-A-PUA-ICTAXAMT           PIC S9(8) SIGN LEADING SEPARATE.
    03  OUT-A-PUA-ICMRSREF           PIC 9(6).
    03  OUT-A-PUA-ICSUBDIV           PIC 9.
    03  OUT-A-PUA-ICCOSTCTR          PIC X(5).
    03  OUT-A-PUA-ICSEQIND           PIC X.
    03  OUT-A-CTRL-M                 PIC X.
    03  OUT-A-NL                     PIC X.

FD  F-OUTPUTB
    LABEL RECORDS OMITTED.

01  OUT-B-REC.
    03  OUT-B-PUA-CDELIM             PIC 99.
    03  OUT-B-PUA-SUPNO              PIC 9(7).
    03  OUT-B-PUA-RTYPE              PIC 99.
    03  OUT-B-PUA-REF                PIC 9(9).
    03  OUT-B-PUA-SEQ                PIC 999.
    03  OUT-B-PUA-ATRNDATE           PIC 9(8).
    03  OUT-B-PUA-PUEXTREF           PIC X(10).
    03  OUT-B-PUA-PUORDNO            PIC 9(11).
    03  OUT-B-PUA-PUVALDATE          PIC 9(8).
    03  OUT-B-PUA-PUSYSDATE          PIC 9(8).
    03  OUT-B-PUA-PUTRNVAL           PIC S9(9) SIGN LEADING SEPARATE.
    03  OUT-B-PUA-PUCLRREF           PIC 9(6).
    03  OUT-B-PUA-PUDELDATE          PIC 9(8).
    03  OUT-B-PUA-PUOTHQRY           PIC X.
    03  OUT-B-PUA-PUSUBDIV           PIC 9.
    03  OUT-B-CTRL-M                 PIC X.
    03  OUT-B-NL                     PIC X.

PROGRAM

Below is a small extract of what the cobol program is doing to the dates regardless of weather their source is COMP or not. (i did not write this code). it seems to be trying to fix the 2kY issue.

IF  PUA-ATRNDATE IS ZERO THEN
    MOVE ZERO                TO OUT-A-PUA-ATRNDATE
ELSE
    MOVE PUA-ATRNDATE        TO W-DATE-6DIGIT
    MOVE W-DATE-SEG1         TO W-DATE-YY
    MOVE W-DATE-SEG2         TO W-DATE-MM
    MOVE W-DATE-SEG3         TO W-DATE-DD
    IF W-DATE-YY > 50 THEN
        MOVE "19"            TO W-DATE-CC
    ELSE
        MOVE "20"            TO W-DATE-CC
    END-IF
    MOVE W-DATE-CCYYMMDD     TO OUT-A-PUA-ATRNDATE
END-IF.

MOVE PUA-ICEXTREF            TO OUT-A-PUA-ICEXTREF.
MOVE PUA-ICORDNO             TO OUT-A-PUA-ICORDNO.

IF  PUA-ICVALDATE IS ZERO THEN
    MOVE ZERO                TO OUT-A-PUA-ICVALDATE
ELSE
    MOVE PUA-ICVALDATE       TO W-DATE-6DIGIT
    MOVE W-DATE-SEG1         TO W-DATE-YY
    MOVE W-DATE-SEG2         TO W-DATE-MM
    MOVE W-DATE-SEG3         TO W-DATE-DD
    IF W-DATE-YY > 50 THEN
        MOVE "19"            TO W-DATE-CC
    ELSE
        MOVE "20"            TO W-DATE-CC
    END-IF
    MOVE W-DATE-CCYYMMDD     TO OUT-A-PUA-ICVALDATE
END-IF.

Program Working-Storage Section

01  W-DATE-6DIGIT               PIC 9(6).
01  W-DATE-6DIGIT-REDEF REDEFINES W-DATE-6DIGIT.
    03  W-DATE-SEG1             PIC 99.
    03  W-DATE-SEG2             PIC 99.
    03  W-DATE-SEG3             PIC 99.

01  W-DATE-CCYYMMDD             PIC 9(8).
01  W-DATE-CCYYMMDD-REDEF REDEFINES W-DATE-CCYYMMDD.
    03  W-DATE-CC               PIC 99.
    03  W-DATE-YY               PIC 99.
    03  W-DATE-MM               PIC 99.
    03  W-DATE-DD               PIC 99.

The DATA

Copied from Notepad++, each line starts at '220...' and end column is 135 before going onto next line, meaning length is 134(?)

    2200010010300005463400022000100106062003000054634000062703    09720200000 í,! íƒ00056319D001144ÕšNNN0    1 G¨    000000197202G    
    2200010010300005463500022000100106062903000054635000062858    09720200000 í, í" íƒ00082838{050906±RNNN0    1 áð    000000197202G    
    2200010010300005465500022000100106073003000054655000063378    09720200000 í í† í00179637A050906±RNNN0    1     000000197202G    

Noticed that above is missing some symbols:

    2200010010300005463400022000100106062003000054634000062703    09720200000 í,[EOT]![SO] íƒ00056319D001144[SOH]ÕšNNN0    1 [SOH]G¨    000000197202G    
    2200010010300005463500022000100106062903000054635000062858    09720200000 í, í" íƒ00082838{050906[SOH]±RNNN0    1 [SOH]áð    000000197202G    
    2200010010300005465500022000100106073003000054655000063378    09720200000 í í† í00179637A050906[SOH]±RNNN0    1 [EOT][NAK][EM]    000000197202G    

Update 2

I've acepted Rick Smith's answer below as when i put his data in i get the correct date-time values. So either my data is fudged or its somthing else as my data throws errors or date-time values 1000s of years in the future.

I've been able to get the ouput CSV of what these date time should actually be which are:

[using : int n = ((b[0] << 16) + (b[1] << 8) + b[2]);]

HEX: 0x20 0xED 0x2C
BIN: 32   237  44
INT: 2157868     (longer than 6 digit)
Actual DATE: 2006-07-16

HEX: 0x04 0x21 0x0e
BIN: 4    33   14
INT: 270606      (correct but segments are in reverse)
Actual DATE: 2006-06-27

HEX: 0x20 0xED 0x83
BIN: 32   237  131
INT: 2157955     (longer than 6 digits)
Actual DATE: 2006-08-03

Update 3

turns out it was bad data...


Solution

  • I created a COBOL file with three date fields using the values contained in the first record of the sample data. The first and third dates are YYMMDD, the second is DDMMYY.

    The dates are given in the code and have the same format as the dates you are trying to read, 3-bytes, big-endian binary.

       environment division.
       input-output section.
       file-control.
           select out-file assign "dates.dat"
               organization sequential
           .
       data division.
       file section.
       fd out-file.
       01 date-rec.
         02 date-1 comp pic 9(6).
         02 date-2 comp pic 9(6).
         02 date-3 comp pic 9(6).
       procedure division.
       begin.
           open output out-file
           move 060716 to date-1
           move 270606 to date-2
           move 060803 to date-3
           write date-rec
           close out-file
           stop run
           .
    

    This C# program then reads those dates with a BinaryReader and displays the bytes, the decimal value, and the converted date. Notice that I commented /*, Encoding.Default*/, since it is not needed.

    using System;
    using System.Globalization;
    using System.IO;
    
    namespace ConsoleApp1
    {
        class Program
        {
            static void Main(string[] args)
            {
                byte[] b = { 0, 0, 0 };
                string s;
                DateTime d = new DateTime();
                using (BinaryReader reader = new BinaryReader(File.Open(@"y:\dates.dat", FileMode.Open)/*, Encoding.Default*/))
                {
                    for (int i = 0; i < 3; i++)  // Three dates in file
                    {
                        b = reader.ReadBytes(b.Length);
                        Console.WriteLine("Bytes: {0}, {1}, {2}", b[0].ToString("X2"), b[1].ToString("X2"), b[2].ToString("X2"));
                        int n = ((b[0] << 16) + (b[1] << 8) + b[2]);
                        s = n.ToString("D6");
                        switch (i)
                        {
                            case 0:
                            case 2:
                                Console.WriteLine("Date(YYMMDD): {0}", s);
                                d = DateTime.ParseExact(s, "yyMMdd", CultureInfo.InvariantCulture);
                                Console.WriteLine("Date(yyyyMMdd): {0}", d.ToString("yyyyMMdd"));
                                break;
                            case 1:
                                Console.WriteLine("Date(DDMMYY): {0}", s);
                                d = DateTime.ParseExact(s, "ddMMyy", CultureInfo.InvariantCulture);
                                Console.WriteLine("Date(yyyyMMdd): {0}", d.ToString("yyyyMMdd"));
                                break;
                            default:
                                break;
                        }
                        Console.WriteLine("");
                    }
                }
            }
        }
    }
    

    This is the output:

    Bytes: 00, ED, 2C
    Date(YYMMDD): 060716
    Date(yyyyMMdd): 20060716
    
    Bytes: 04, 21, 0E
    Date(DDMMYY): 270606
    Date(yyyyMMdd): 20060627
    
    Bytes: 00, ED, 83
    Date(YYMMDD): 060803
    Date(yyyyMMdd): 20060803
    

    The DateTime.ParseExact(s, "yyMMdd", CultureInfo.InvariantCulture); was from an answer to this question, String to DateTime conversion as per specified format.