Search code examples
mainframedfsort

Sort on length of field


I want to write a sort JCL with requirement where I want to sort on variable record length file

Input file:

Mark aaaaaaa
Amy bbbbbb
Paula ccccccccccc

Sort on the length of field before spaces on ascending order. That is sorting on length of first col/word Mark,Amy etc.. On basis of their length.

And second one is like performing sort on field after spaces on descending order but if any vowels in field should always be first and then rest of data. Coming on second part ,here it's like the fields after spaces or aaaaa, bbbbb and ccccc we need to sort it in descending order (alphabetically) ,but then we also need to check if the field is vovel ,if any vovel then that field will be always as top, so the expected output will be like: Considering above input file output file will be:

Mark aaaaaaaa
Paula cccccc
Amy bbbbbb

Now here vovel as in first record which contains aaaa in it is at top and rest data is sorted in descending order. I want to achieve this.


Solution

  • What you are asking is not at all a simple thing :-)

    Whilst DFSORT has much intrinsic functionality, finding the length of a sequence of non-space characters is not available.

    So you have to roll-your-own.

    Although the task is also possible with fixed-length records (different technique) it is easier with variable-length records.

    Because the fields are variable-length as well, you'll need PARSE to separate the fields. For variable-length or variably-located fields, PARSE is usually the answer.

    PARSE creates fixed-length parsed fields, so you have to know the maximum lengths of your text. In this example 30 is chosen for each.

    The solution will develop piece by piece, because you will need to be secure in your understanding of it. The pieces are presented as "stand alone" code which you can run and see what happens:

    OPTION COPY

      INREC IFTHEN=(WHEN=INIT, 
                     PARSE=(%01=(ENDBEFR=C' ',
                                 FIXLEN=30), 
                            %02=(FIXLEN=30))),
    
    
            IFTHEN=(WHEN=INIT, 
                     BUILD=(1,4,%01,%02)) 
    

    If you run that, you will get this output:

    MARK                          AAAAAAA                       
    AMY                           BBBBBB                        
    PAULA                         CCCCCCCCCCC                   
    

    INREC runs before a SORT, so to make any changes to the data before a SORT, you use INREC. OUTREC runs after SORT, and OUTFIL after OUTREC.

    For now, the BUILD is just to show that the PARSEd fields contain the output you want (don't worry about the case, if you used mixed-case it will be like that).

    WHEN=INIT means "do this for each record, before the following IFTHEN statements (if any)". You can use multiple WHEN=INIT, and you have to use multiple IFTHEN of some type to transform data in multiple stages.

    The 1,4 in the BUILD is for the Record Descriptor Word (RDW) which each variable-length record hase, and is always necessary when creating a variable-length current record in SORT, but we'll use it for another purpose here as well.

    The next stage is to "extend" the records, because we need two fields to SORT on. For a variable-length record, you extend "at the front". In general:

    BUILD=(1,4,extensionstuff,5)
    

    This makes a new version of the current record, with first the RDW from the old current record, then "does some stuff" to create the extension, then copies from position 5 (the first data-byte on a variable-length record) to the end of the record.

    Although the RDW is "copied", the value of the RDW at the time is irrelevant, as it will be calculated for the BUILD. It just must be an RDW to start with, you can't just put anything there except an actual RDW.

    Another component that will be needed is to extend the records for the SORT key. We need the length of the first field, and we need a "flag" for whether or not to "sort early" for the second field containing a vowel. For the length it will be convenient to have a two-byte binary value. For now, we are just reserving bytes for the things:

    OPTION COPY
    INREC BUILD=(1,4,2X,2X,X,5)
    

    The 2X is two blanks, the X is one blank, so a total of five blanks. It could have been written as 5X, and in the final code is best that way, but for now it is clearer. Run that and you will see your records prefixed by five blanks.

    There are two tasks. The length of the first field, and whether the second field contains a vowel.

    The key to the first task is to replace blanks from the PARSEd field with "nothing". This will cause the record to be shortened by one for each blank replaced. Saving the length of the original current record, and calculating with the length of the current record and the fixed-length (30) reveals the length of the data.

    The key to the second task applies a similar technique. This time, change the second PARSEd field such that a, e, i, o, u are replaced by "nothing". Then if the length is the same as the original, there were no vowels.

    The FINDREP will look something like this:

         IFTHEN=(WHEN=INIT, 
                  FINDREP=(IN=C' ', 
                           OUT=C'', 
                           STARTPOS=n1, 
                           ENDPOS=n2)),
    

    You'll need a variant for the vowels:

         IFTHEN=(WHEN=INIT, 
                  FINDREP=(IN=(C'A',C'E',C'I',C'O',C'U'), 
                           OUT=C'', 
                           STARTPOS=n1, 
                           ENDPOS=n2)),
    

    To run:

      OPTION COPY 
    
      INREC IFTHEN=(WHEN=INIT, 
                     PARSE=(%01=(ENDBEFR=C' ',
                                 FIXLEN=30), 
                            %02=(FIXLEN=30))),
    
            IFTHEN=(WHEN=INIT, 
                     BUILD=(1,4,2X,X,%02)), 
    
            IFTHEN=(WHEN=INIT, 
                     OVERLAY=(5:1,2)), 
    
            IFTHEN=(WHEN=INIT, 
                      FINDREP=(IN=(C'A', 
                                   C'E', 
                                   C'I', 
                                   C'O', 
                                   C'U'), 
                               OUT=C'', 
                               STARTPOS=8, 
                               ENDPOS=38)), 
    
            IFTHEN=(WHEN=(1,4,BI,EQ,5,2,BI), 
                    OVERLAY=(7:C'N')) 
    

    If you run that, you will see the flag (third data-position) is now space (for a vowel present) or "N". Don't worry that all the "A"s have disappeared, they are still tucked away in %02.

    OVERLAY can make changes to the current record without creating a new, replacement record (which is what BUILD does). You'll see OVERLAY used below to get the new record-length after the a new current record-length has been created (the BUILD would get the original record-length from the RDW).

    A similar process for the other task.

    I've included some additional test-data and made further assumptions about your SORT order. Here's full, annotated (the comments can remain, they do not affect the processing), code:

    * PARSE CURRENT INPUT TO GET TWO FIELDS, HELD SEPARATELY FROM THE RECORD. 
    * 
      INREC IFTHEN=(WHEN=INIT, 
                     PARSE=(%01=(ENDBEFR=C' ', 
                                 FIXLEN=30), 
                            %02=(FIXLEN=30))), 
    
    * MAKE A NEW CURRENT RECORD, RDW FROM EXISTING RECORD, THREE EXTENSIONS, AND 
    * A COPY OF THE FIRST PARSED FIELD. 
    * 
            IFTHEN=(WHEN=INIT, 
                    BUILD=(1,4, 
                           2X, 
                           2X, 
                           X, 
                           %01)), 
    
    * STORE THE LENGTH OF THE NEW CURRENT RECORD ON THE CURRENT RECORD. 
    * 
            IFTHEN=(WHEN=INIT, 
                     OVERLAY=(5: 
                                1,2)), 
    
    * REPLACE BLANKS WITH "NOTHING" WITHIN THE COPY OF THE PARSED FIELD. THIS WILL 
    * AUTOMATICALLY ADJUST THE RDW ON THE CURRENT RECORD. 
    * 
            IFTHEN=(WHEN=INIT, 
                      FINDREP=(IN=C' ', 
                               OUT=C'', 
                               STARTPOS=10, 
                               ENDPOS=40)), 
    
    * CALCULATE THE LENGTH OF THE NON-BLANKS IN THE FIELD, BY SUBTRACTING PREVIOUS 
    * STORED RECORD-LENGTH FROM CURRENT RECORD-LENGTH (FIRST TWO BYTES, BINARY, OF 
    * RDW) AND ADDING 30 (LENGTH OF PARSED FIELD). 
    * 
            IFTHEN=(WHEN=INIT, 
                    OVERLAY=(5: 
                               1,2,BI, 
                               SUB, 
                                5,2,BI, 
                               ADD, 
                                +30, 
                               TO=BI, 
                               LENGTH=2)), 
    
    * MAKE A NEW CURRENT RECORD, COPYING RDW AND THE VALUE CALCULATED ABOVE, BLANKS
    * (COULD BE COPIED) AND THEN THE SECOND PARSED FIELD. 
    * 
            IFTHEN=(WHEN=INIT, 
                     BUILD=(1,4, 
                            5,2, 
                            2X, 
                            X, 
                            %02)), 
    
    * AGAIN SAVE THE LENGTH OF THE NEW CURRENT RECORD. 
    * 
            IFTHEN=(WHEN=INIT, 
                     OVERLAY=(7: 
                                1,2)), 
    
    * CHANGE ALL VOWELS TO "NOTHING". THIS WILL AUTOMATICALLY ADJUST THE RDW. FOR
    * MIXED-CASE JUST EXTEND THE IN TO INCLUDE LOWER-CASE VOWELS AS WELL. 
    * 
            IFTHEN=(WHEN=INIT, 
                     FINDREP=(IN=(C'A', 
                                  C'E', 
                                  C'I', 
                                  C'O', 
                                  C'U'), 
                              OUT=C'', 
                              STARTPOS=10, 
                              ENDPOS=40)), 
    
    * CALCULATE NUMBER OF VOWELS. 
    * 
            IFTHEN=(WHEN=INIT, 
                     OVERLAY=(7: 
                                7,2,BI, 
                               SUB, 
                                1,2,BI, 
                               TO=BI, 
                               LENGTH=2)), 
    
    * MAKE A NEW CURRENT RECORD TO BE SORTED, WITH BOTH PARSED FIELDS. 
    * 
            IFTHEN=(WHEN=INIT, 
                     BUILD=(1,4, 
                            5,2, 
                            7,2, 
                            9,1, 
                            %01, 
                            %02)), 
    
    * SET THE FLAG TO "OUTSORT" THOSE RECORDS WITH A VOWEL IN THE SECOND FIELD. 
    * 
            IFTHEN=(WHEN=(7,2,BI,EQ,0), 
                     OVERLAY=(9: 
                                C'N')) 
    
    * SORT ON "OUTSORT FLAG", LENGTH OF NAME (DESCENDING), NAME, 2ND FIELD. 
      SORT FIELDS=(9,1,CH,A, 
                   5,2,CH,D, 
                   10,30,CH,A, 
                   40,30,CH,A) 
    
    * FIELDS NEEDED TO BE IN FIXED POSITION FOR SORT, AND EXTENSION FIELDS NO 
    * LONGER NEEDED. ALSO REMOVE BLANKS FROM THE TWO FIELDS, KEEPING A SEPARATOR   
    * BETWEEN THEM. THIS COULD INSTEAD BE DONE ON THE OUTFIL. 
    * 
      OUTREC BUILD=(1,4, 
                    10,60, 
                     SQZ=(SHIFT=LEFT, 
                          MID=C' ')) 
    
    * CURRENTLY THE VARIABLE-LENGTH RECORDS ARE ALL THE SAME LENGTH (69 BYTES) SO 
    * REMOVE TRAILING BLANKS. 
    * 
      OUTFIL VLTRIM=C' ' 
    

    Extensive test-data:

    MARK AAAAAAA 
    AMY BBBBBB 
    PAULA CCCCCCCCCCC
    PAULA BDDDDDDDDDD
    IK JJJJJJJJJJO 
    

    You can also see how the code works by "removing a line at a time" from the end of the code, so you can see how the transformation reaches that point, or by running the code increasing a line at a time from the start of the code.

    It is important that you, and your colleagues, understand the code.

    There are some opportunities for some rationalisation. If you can work those out, it means you understand the code. Probably.