Search code examples
group-bymainframesyncsort

GROUP BY CLAUSE using SYNCSORT


I have some content in a file on which I must generate statistics such as how many of records are of type - 1, type - 2 etc. Number of types can change and is unknown to the code until file arrives. In a SQL system, I can do this using COUNT and GROUP BY clause. But I am not sure if I can do this using SYNCSORT or COBOL program. Would anyone here have an idea on how I can implement 'GROUP BY' type query on a file using SYNCSORT.

Sample Data:

TYPE001 SUBTYPE001 TYPE01-DESC
TYPE001 SUBTYPE002 TYPE01-DESC
TYPE001 SUBTYPE003 TYPE01-DESC
TYPE002 SUBTYPE001 TYPE02-DESC
TYPE002 SUBTYPE004 TYPE02-DESC
TYPE002 SUBTYPE008 TYPE02-DESC

I want to get the information such as TYPE001 ==> 3 Records, TYPE002 ==> 3 Records. What the code doesn't know until runtime is the TYPENNN value


Solution

  • You show data already in sequence, so there is no need to sort the data itself, which makes SUM FIELDS= with SORT a poor solution if anyone suggests it (plus code for the formatting).

    MERGE with a single input file and SUM FIELDS= would be better, but still require the code for formatting.

    The simplest way to produce output which may suit you is to use OUTFIL reporting functions:

     OPTION COPY 
     OUTFIL NODETAIL, 
            REMOVECC, 
            SECTIONS=(1,7, 
                      TRAILER3=(1,7, 
                                ' ==> ', 
                                COUNT=(M10,LENGTH=3),
                                ' Records'))
    

    The NODETAIL says "remove all the data lines". The REMOVECC says "although it is a report, don't use printer-control characters on position one of the output records". The SECTIONS says "we're going to use control-breaks, and here they (it in this case) are". In this case, your control-field is 1,7. The TRAILER3 defines the output which will be produced at each control-break: COUNT here is the number of records in that particular break. M10 is an editing mask which will change leading zeros to blanks. The LENGTH gives a length to the output of COUNT, three is chosen from your sample data with sub-types being unique and having three digits as the unique part of the data. Change to whatever suits your actual data.

    You've not been clear, and perhaps you want the output "floating" (3bb instead of bb3, where b represents a blank)? That would require more code...