Search code examples
mainframedfsort

Remove multiple header from ps file


I have a ps file in which I want to remove header if there is no data below it i.e . if there are headers(recognized by FIRST 3 letter HDR) in two consecutive line, I want to remove the first one as there is no data for it.

Input data

HDR20170123
HDR20170124
1.8988 ABCD
1.4324 PARE
HDR20170125
1.5432 URST

Desired Output

HDR20170124
1.8988 ABCD
1.4324 PARE
HDR20170125
1.5432 URST

Is there anyway using dfsort , we can do this ?


Solution

  • There are two techniques, and the JOINKEYS technique is easier to explain in a short space of time.

    You use JOINKEYS, with your data set name for both input files.

    You define JNFnCNTL data sets for both inputs, and in each of those you append a sequence number to each record. One (JNFCNTL1) sequence number you start from zero, the other (JNFCNTL2) you start from one. The sequence numbers need to be large enough to for the number of your records to be expressed.

    The JOINKEYS key you make the sequence-numbers on the files.

    Use JOIN UNPAIRED,F2 (which will get you matches, and unmatched on F2).

    REFORMAT of F1:1,3:F2:1,80

    OMIT with COND= for the main task, where you get rid of the records where 1,3 is HDR and 1,3 matches 4,3 (previous record was a header).

    Then BUILD=(4,80) in the main task, to get rid of the first three bytes from the previous record.

    On the inputs your data will look like this, offset to represent the sequence numbers:

         F1          F2
    HDR20170123 
    HDR20170124 HDR20170123
    1.8988 ABCD HDR20170124
    1.4324 PARE 1.8988 ABCD
    HDR20170125 1.4324 PARE
    1.5432 URST HDR20170125
                1.5432 URST
    

    And on the REFORMAT:

    HDR 
    HDRHDR20170123
    1.8HDR20170124
    1.41.8988 ABCD
    HDR1.4324 PARE
    1.5HDR20170125
       1.5432 URST
    

    What you've achieved is then availability of data from the previous record (first three bytes, as much as you need for a given case) whilst you have the current record, so testing values to the previous record is easy.

    Time for some code now:

    //SYSIN    DD * 
      OPTION COPY 
      JOINKEYS F1=INA,FIELDS=(81,6,A),SORTED,NOSEQCK
      JOINKEYS F2=INB,FIELDS=(81,6,A),SORTED,NOSEQCK
      JOIN UNPAIRED,F2 
      REFORMAT FIELDS=(F1:1,3, 
                      F2:1,80) 
      OMIT COND=(1,3,CH,EQ,C'HDR', 
                AND, 
                 1,3,CH,EQ,4,3,CH) 
      INREC BUILD=(4,80) 
    //JNF1CNTL DD * 
      INREC  OVERLAY=(81:SEQNUM,6,ZD, 
                         START=0) 
    //JNF2CNTL DD * 
      INREC  OVERLAY=(81:SEQNUM,6,ZD, 
                         START=1) 
    //INA      DD * 
    HDR20170123 
    HDR20170124 
    1.8988 ABCD 
    1.4324 PARE 
    HDR20170125 
    1.5432 URST 
    //INB      DD * 
    HDR20170123 
    HDR20170124 
    1.8988 ABCD 
    1.4324 PARE 
    HDR20170125 
    1.5432 URST 
    

    That produces your desired output:

    HDR20170124
    1.8988 ABCD
    1.4324 PARE
    HDR20170125
    1.5432 URST
    

    A JOINKEYS operation will consist of three "tasks" operating concurrently. The "main task" is an entirely normal SORT step, consisting of whatever you want.

    There are two sub-tasks, one for each of the input data sets. Each of those sub-tasks can have further control cards supplied to modify their data. These are specified on JNFnCNTL DDs. JNF1CNTL and JNF2CNTL. You may supply none, either one, or both, as per actual requirement. Here you want both.

    JNFnCNTL data sets must only contain a subset of normal control cards. They may not contain OUTREC nor OUTFIL. This is because they inter-operate with the Main Task at exactly the point where OUTREC could otherwise exist.

    On the JOINKEYS statement, specify SORTED,NOSEQCK. This is because, by default, the JOINKEYS data sets are sorted (on the key for the match), and the sequence is already guaranteed (and does not need to be checked) because the sequence of the keys is a sequence number.

    The REFORMAT statement should only include data that is required in the main task. Here all that is needed is the location where HDR may exist, and the full record from the F2.

    JOIN UNPAIRED,F2 will obtain all matched records, and all F2 which do not match (there will only be one, the final record, because the match is on the sequence number, offset by one).

    To understand this (or any DSFORT data manipulations further) make amendments to show the data at intermediate stages. Here there is only one stage, so it is simple:

    //SYSIN    DD * 
      OPTION COPY 
      JOINKEYS F1=INA,FIELDS=(81,6,A),SORTED,NOSEQCK
      JOINKEYS F2=INB,FIELDS=(81,6,A),SORTED,NOSEQCK
      JOIN UNPAIRED,F2 
      REFORMAT FIELDS=(F1:1,12,81,6,12,1, 
                       F2:1,12,81,6,12,1, 
                       ?) 
    //JNF1CNTL DD * 
      INREC  OVERLAY=(81:SEQNUM,6,ZD, 
                         START=0) 
    //JNF2CNTL DD * 
      INREC  OVERLAY=(81:SEQNUM,6,ZD, 
                         START=1) 
    //INA      DD * 
    HDR20170123 
    HDR20170124 
    1.8988 ABCD 
    1.4324 PARE 
    HDR20170125 
    1.5432 URST 
    //INB      DD * 
    HDR20170123 
    HDR20170124 
    1.8988 ABCD 
    1.4324 PARE 
    HDR20170125 
    1.5432 URST 
    

    Produces this output:

    HDR20170124 000001 HDR20170123 000001 B
    1.8988 ABCD 000002 HDR20170124 000002 B
    1.4324 PARE 000003 1.8988 ABCD 000003 B
    HDR20170125 000004 1.4324 PARE 000004 B
    1.5432 URST 000005 HDR20170125 000005 B
                       1.5432 URST 000006 2
    

    Since you only show 11 bytes of data in an 80-byte data, on this REFORMAT statement the first 12 positions are taken (the 12th to leave a blank) and 12,1 is also used as a separator (literals cannot be used in a REFORMAT). The respective sequence numbers are also shown, as is the inbuilt match-marker (the ? (question-mark) in the REFORMAT: B for Both files, 2 for on F2 only (no 1 is shown, as the JOIN statement only asks for matches and unmatched F2).