Search code examples
cobolgnucobol

COBOL: Simple file reading issue


I have a very basic COBOL program that reads a file input.dat and simply outputs it in the console. The input.dat file looks like this:

John                Johnson             
Peter               Peterson            
Juliette            Julietteson         
Natasha             Natashason          
Justin              Justinson           

It isn't correctly displayed here but I'm positive there are 20 chars for the first name and 20 chars for the last name.

This is my COBOL program:

    IDENTIFICATION DIVISION.
    PROGRAM-ID. ATEST4.
    ENVIRONMENT DIVISION.
    INPUT-OUTPUT SECTION.
    FILE-CONTROL.
        SELECT INPUTFILE ASSIGN TO "files/input.dat".
    DATA DIVISION.
    FILE SECTION.
    FD  INPUTFILE LABEL RECORDS ARE OMITTED.
    01  INPUTRECORD              PIC X(40).    
    WORKING-STORAGE SECTION.
    01  FILE-STATUS              PIC 9 VALUE 0.
    PROCEDURE DIVISION.
    001-MAIN.
        OPEN INPUT INPUTFILE.
        PERFORM 002-READ UNTIL FILE-STATUS = 1.
        CLOSE INPUTFILE.
        STOP RUN.
            
    002-READ.
        READ INPUTFILE
            AT END MOVE 1 TO FILE-STATUS
            NOT AT END DISPLAY INPUTRECORD
        END-READ.  

Instead, the output looks like this:

John                Johnson             
Peter               Peterson            
Juliette            Julietteson         
Natasha             Natashason          
Justin              Justinson           
ustin              Justinson       

The last line seems to be a copy of the previous one, with a missing first character and several less trailing spaces (it totals to 35 chars).

Why does this happen? It seems like a misconception of the AT END clause, but I can't get around it.

Edit: updated compiler as suggested. Still the same result. Here is a link for my input file, if it helps


Solution

  • OK, missed a trick. Or two. You are using fixed-length records of 40 bytes. When you use fixed-length records, unlike line-sequential, there is no single-trailing-null-stripping on READ and null-appending on WRITE.

    I also pasted your data from the question, and it arrived to me as 40-byte records including the null record-delimiter.

    Now that I have your real data...

    Instead of five records of 40 bytes, you have five of 41. If you consider that a a lump of data, which the COBOL program is going to read 40 bytes at a time, that gives you five records of 40 bytes, and one of five.

    Without nulls being appended to the records, I should see all the output data as one long line. But I don't. Why?

    There are this time, with the "long" records, there are leading record-delimiters of null, on all but the first record.

    Here's some data for you to test with:

    John                Johnson   0123456789
    Peter               Peterson  0123456789
    Juliette            Julietteson123456789
    Natasha             Natashason0123456789
    Justin              Justinson 0123456789
    1234511111111111111111111111111111111110
    

    That is intended to be 40 bytes of data per record, followed by a null record-terminator.

    Here's your modified program to compile and run the data through. Rather than fixing up the columns after pasting from your question (pasting from SO is not great for COBOL) I used `cobc -x -free prog.cob. And since it was busted, didn't pay much attention to where I jammed the new stuff in.

    The point of the ">" and "<" in the DISPLAYs is to bound the field. Then you can identify the position of the nulls, as they cause a break.

    IDENTIFICATION DIVISION.
    PROGRAM-ID. FILE-TEST.
    ENVIRONMENT DIVISION.
    INPUT-OUTPUT SECTION.
    FILE-CONTROL.
    SELECT INPUTFILE ASSIGN TO "files/input.dat"
    file status is fs1.
    SELECT OUTPUTFILE ASSIGN TO "files/output.dat"
    file status is fs2.
    DATA DIVISION.
    FILE SECTION.
    FD INPUTFILE
        LABEL RECORDS ARE OMITTED
        record is varying depending on record-length.
    01 INPUTRECORD PIC X(40).
    FD OUTPUTFILE
        LABEL RECORDS ARE OMITTED.
    01 OUTPUTRECORD PIC X(40).
    WORKING-STORAGE SECTION.
        01 EOF PIC 9 VALUE 0.
        01  fs1 pic xx.
            88  fs1-ok value zero.
            88  fs1-eof value "10".
        01  fs2 pic xx.
            88  fs2-ok value zero.
        01 CUSTOMER.
            02 FIRST-NAME PIC X(20).
            02 LAST-NAME PIC X(20).
        01  rec-count comp-3 pic 999 value zero.
    PROCEDURE DIVISION.
        001-MAIN.
            OPEN INPUT INPUTFILE OUTPUT OUTPUTFILE.
            if not fs1-ok
                display "bad fs1 O>" fs1 "<"
            end-if
            if not fs2-ok
               display "bad fs2 O>" fs2 "<"
            end-if
            PERFORM 002-READWRITELOOP UNTIL EOF = 1.
            CLOSE INPUTFILE. 
           if not fs1-ok
               display "bad fs1 C>" fs1 "<"
           end-if
           CLOSE OUTPUTFILE.
           if not fs2-ok
               display "bad fs2 C>" fs2 "<"
           end-if
           STOP RUN.
    
       002-READWRITELOOP.
           READ INPUTFILE INTO CUSTOMER
               AT END MOVE 1 TO EOF
                  display "at end"
           if not fs1-ok
               display "bad fs1 R>" fs1 "< " rec-count
           end-if
               NOT AT END WRITE OUTPUTRECORD FROM CUSTOMER
          DISPLAY ">" CUSTOMER "<"
          DISPLAY ">" inputrecord "<"
          add 1 to rec-count
           display rec-count
           if not fs1-ok
               display "bad fs1 R>" fs1 "< " rec-count
           end-if
           if not fs2-ok
               display "bad fs2 W>" fs2 "<"
           end-if
           END-READ
          .
    

    I will try to understand why the last "line" appears like it does, as in your original, and take it up in the GnuCOBOL discussion area.

    The fix: either use LINE SEQUENTIAL on the SELECT and leave your data as it is; or, remove all the null/new-lines from your data. Either of these will give your 40 bytes of correctly lined-up data for the (now) six records in the input.


    OK, you're going to like this.

    I changed your original program by adding the FILE STATUS clause to both the SELECT statements.

    I tested the file-status fields I defined after each IO (OPEN, CLOSE, READ and WRITE).

    The OPENs and CLOSEs give a file-status of "00". Expected.

    The first four READs give a file-status of "00". Expected.

    The five WRITEs give a file-status of "00". Expected.

    The fifth READ gives a file-status of "04". Which means:

    A READ statement was successfully executed, but the length of the record being processed did not conform to the fixed file attributes for that file.
    

    So, expected. NeitherAT END nor NOT AT END are concerned about that.

    Your program could have known you had read a short, or a long, record, if you had used the FILE STATUS.

    How are there six output records if there are only five WRITE statements executed?

    Well, because you have 35 bytes of data plus a "new line", because the new-line will only be stripped after the 40th byte, when you DISPLAY the data, you get two lines. On the file, there is one "record", but it has an embedded newline. Instead of using an editor which would show me the hex values, I used cat, so saw a "sixth record", and then a text editor, and again saw the sixth record.

    Exactly why you saw an almost entire "sixth record" I don't know, but it is of historical interest. If you want to look at the OpenCOBOL source to attempt to discover why, you can find it in the Files section of the GnuCOBOL site.

    With GnuCOBOL, you DISPLAY or WRITE the 40-byte field, with an embedded blank. The DISPLAY will always do a line-break on the embedded "null" value, which gave me the apparent 35-byte followed by four-byte records, the 40th byte (actually the 36th) is the "invisible" null.

    The WRITE would not cause a line-break with the embedded null, until you use something to "look at" the file which is expecting the data to be text rather than binary.

    The "problem" in GnuCOBOL is not a problem, it is the way DISPLAY works (which is expecting text data, not binary) or, if using WRITE, the way you then "view" the file.

    The actual OpenCOBOL output you got is effectively a bug, but it cannot be reproduced in GnuCOBOL.

    The GnuCOBOL output from your program with your data (last record 35 bytes of data) can be explained. The spaces I got we because of COBOL's "padding" when you move a shorter field to a longer field. READ ... INTO ... contains an implicit MOVE, so you get the padding.

    If you had just used the record from the data-area under the FD, you'd have got more unpredictable results.

    So it does make the question on-topic. I think. So the question should stay, as others will almost certainly have a similar problem at some time. The problem being the use of DISPLAY with non-text data, or viewing output files with text-only tools. Or does that last mean it would be off-topic with WRITE? :-)

    Resolution is two-fold. Upgrade to GnuCOBOL. Use the FILE STATUS always, and always check the (correct) file-status field (one for each file is best) after each IO, and take some sensible action when something unexpected happens.

    The file-status value for end-of-file is "10". I add 88s to the file-status fields, and always use the "10" for end-of-file checking. I never use the tangle of AT END/NOT AT END.

    If you'd used Gilbert's suggestion, with the priming read, and without the INTO I think you'd have got different results, which would have assisted resolution. The priming read (always have a current record before entering your read loop, then read the next record (or get end-of-file) as the last logical thing in the loop) is a more "COBOL" way to do things. As is the FILE STATUS and the 88s on the file-status fields, and the checking each time.

    You could also look at using LINE SEQUENTIAL on the SELECT. This is a more natural type of file for Linux/Unix/Windows, "records" are then known to be delimited, and you'd have got valid records of 40, 40, 40, 40 and 35 bytes.

    Then you have variable-length records, and you would need to know how to deal with those.


    This type of question, a problem with the data, is normally off-topic at Stack Overflow.

    However, the behaviour is not correct, there is reason to expect that you wouldn't get the final five bytes of the last record (it looked like you did originally, because you used READ INTO). However, an error with the data like that should not get your program to think there is an extra record.

    I will raise the issue on the GnuCOBOL discussion pages, where (disclosure) I am a Moderator.

    I would suggest you upgrade to GnuCOBOL 1.1.0, which has a number of bug-fixes over OpenCOBOL 1.1, and which is actively developed (GnuCOBOL is the new name for OpenCOBOL, so OpenCOBOL itself is no longer developed).

    A couple of points on your code.

    The structure suggested in a previous answer by Gilbert LeBlanc,to this question, unjustly downvoted, is much better in a COBOL program.

    Using an actual FILE STATUS on the SELECT statement for a file is much better than using AT END and its relatives. You test then test (with an 88 preferably) the file status after each IO, so you can identify a problem as soon as it has happened. I will test whether this would have assisted in this instance anyway.