Simplifying a DF/Sort job thats reads SMF to analyse a dataset's lifecycle

So I have a batch job that extracts SMF type 14, 15 and 17 records into 3 separate files and then formats the files to produce a list of which datasets were read, written to and delete by which jobs. This is then sorted by timestamp so you can see the 'lifecycle' for a particular dataset.

However, I know that DF/Sortt is pretty powerful and I think that my initial step to separate out the type 14, 15 and 17 records isn;t necessary, and it could be done in one step, but I'm not really sure where to start as DFSort/ICETOOL has gotten pretty sophisticated.

Here's my current JCL:

//JBSP03DL JOB (JSDBBSP,P10),'SMF F NOW',                              
//         NOTIFY=&SYSUID,
//         CLASS=L,
//         MSGCLASS=X,
//         REGION=8M                                            
//*
//DELETE   EXEC PGM=IEFBR14
//OUTDSN DD DISP=(MOD,DELETE),DSN=JSDBSP.JBSP03.DSLIFE.TXT,
//       UNIT=SYSDA
//*
//SMFDUMP  EXEC PGM=IFASMFDP,REGION=6M
//*
//SYSPRINT DD SYSOUT=*
//* Extract type 14, 15 and 17 records into 3 temporary datasets
//DUMPIN   DD DISP=SHR,DSN=JSHSMF.SMF.JXSF.MANDUMP
//*
//DUMP14   DD  DISP=(,PASS),DSN=&&TYPE14,
//            UNIT=SYSDA,SPACE=(CYL,(500,200),RLSE),
//            BUFNO=20,BLKSIZE=27998,LRECL=32760,RECFM=VBS
//DUMP15   DD  DISP=(,PASS),DSN=&&TYPE15,
//            UNIT=SYSDA,SPACE=(CYL,(500,200),RLSE),
//            BUFNO=20,BLKSIZE=27998,LRECL=32760,RECFM=VBS
//DUMP17   DD  DISP=(,PASS),DSN=&&TYPE17,
//            UNIT=SYSDA,SPACE=(CYL,(500,200),RLSE),
//            BUFNO=20,BLKSIZE=27998,LRECL=32760,RECFM=VBS
//*
//SYSIN    DD *
INDD(DUMPIN,OPTIONS(DUMP))
OUTDD(DUMP14,TYPE(14))
OUTDD(DUMP15,TYPE(15))
OUTDD(DUMP17,TYPE(17))
//*
//SORTPROC PROC
//SORTWRTE EXEC PGM=SORT,REGION=8M
//SORTOUT  DD  DISP=MOD,DSN=&&SORTTMP,
//             SPACE=(CYL,(20,20)),UNIT=SYSDA
//SYSOUT   DD   SYSOUT=*
//SYSPRINT DD   SYSOUT=*
//SORTWK01 DD   DISP=(NEW,DELETE),DSN=&&TEMPSORT,UNIT=SYSDA,
//  SPACE=(CYL,(50,50))
//         PEND
//*
//* Process the type 14 records
//TYPE14   EXEC SORTPROC
//SORTIN   DD   DISP=SHR,DSN=&&TYPE14
//SORTOUT  DD  DISP=(,PASS),DSN=&&SORTTMP,
//             SPACE=(CYL,(20,20)),UNIT=SYSDA,
//             LRECL=133
//SYSIN    DD   *
  SORT FIELDS=(11,4,PD,A,7,4,PD,A)
  SUM FIELDS=NONE
  OUTREC BUILD=(11,4,DT1,EDIT=(TTTT-TT-TT),   DATE OF RECORD
                C' AT ',
                7,4,TM4,EDIT=(TT:TT:TT.TT), TIME OF RECORD
                C' ',
                69,44,
                C' was opened by ',
                19,8),CONVERT
//*
//* Process the type 15 records
//TYPE15   EXEC SORTPROC
//SORTIN   DD   DISP=SHR,DSN=&&TYPE15
//SYSIN    DD   *
  SORT FIELDS=(11,4,PD,A,7,4,PD,A)
  SUM FIELDS=NONE
  OUTREC BUILD=(11,4,DT1,EDIT=(TTTT-TT-TT),   DATE OF RECORD
                C' AT ',
                7,4,TM4,EDIT=(TT:TT:TT.TT), TIME OF RECORD
                C' ',
                19,8,
                C' opened ',
                69,44,
                C' for output'),CONVERT
//*
//* Process the type 17 records
//TYPE17   EXEC SORTPROC
//SORTIN   DD   DISP=SHR,DSN=&&TYPE17
//SYSIN    DD   *
  SORT FIELDS=(11,4,PD,A,7,4,PD,A)
  SUM FIELDS=NONE
  OUTREC BUILD=(11,4,DT1,EDIT=(TTTT-TT-TT),   DATE OF RECORD
                C' AT ',
                7,4,TM4,EDIT=(TT:TT:TT.TT), TIME OF RECORD
                C' ',
                19,8,
                C' deleted ',
                44,44),CONVERT
//*
//*  Finally sort the output file by the date & time stamp 
//*
//FINAL   EXEC SORTPROC
//SORTIN   DD   DISP=(OLD,DELETE),DSN=&&SORTTMP
//SORTOUT  DD   DISP=(NEW,CATLG),DSN=JSDBSP.JBSP03.DSLIFE.TXT,
//            UNIT=SYSDA,LRECL=121,RECFM=FB,SPACE=(CYL,(20,30))
//SYSIN    DD   *
SORT FIELDS=(1,23,CH,A)

It is possible to do this without separating the 14, 15 and 17 records into separate files?

Edit : the above JCL does exactly what I wan, but I'd like to be able to filter by dataset name or job name if possible, as this can produce a lot of output which is then too big for ISPF Edit or View for further analysis

Edit:

    Type 14 : 
5   5   SMF14RTY    1   binary  Record type 14 (X'0E').
18  12  SMF14JBN    8   EBCDIC  Job name.
68  44  SMF14_JFCBDSNM  44  EBCDIC DATA SET NAME (DSNAME=)

    Type 15 : 
5   5   SMF14RTY    1   binary  Record type 14 (X'0F').
18  12  SMF15JBN    8   EBCDIC Jobname
68  44  SMF15_JFCBDSNM  44  EBCDIC DATA SET NAME (DSNAME=)

    Type 17:
5   5   SMF17RTY    1   binary  Record type 17 (X'11').
18  12  SMF17JBN    8   EBCDIC  Job name.
44  2C  SMF17DSN    44  EBCDIC  Data set name.

A further enhancement would be to check if an OPEN was actually creating the dataset. I should also add RENAMES, otherwise you might lose track of what happened to a particular dataset.

Edit:

Following Bill's guidelines, my JCL is now:

//DELETE   EXEC PGM=IEFBR14                                   
//OUTDSN DD DISP=(MOD,DELETE),DSN=JSDBSP.JBSP03.DSLIFE.TXT,   
//       UNIT=SYSDA                                           
//*                                                           
//SORTWRTE EXEC PGM=SORT,REGION=8M                            
//*                                                           
//SORTIN   DD   DISP=SHR,DSN=JSHSMF.SMF.JXSG.MANDUMP          
//SORTOUT  DD  DISP=(MOD,CATLG),DSN=JSDBSP.JBSP03.DSLIFE.TXT, 
//             SPACE=(CYL,(20,20)),                           
//             UNIT=SYSDA,LRECL=133                           
//*                                                           
//SYSOUT   DD   SYSOUT=*                                      
//SYSPRINT DD   SYSOUT=*                                      
//SYMNOUT  DD   SYSOUT=*                                      
//SYMNAMES DD   *                                             
 SMF-RECORD-TYPE,5,1,BI                                       
 SMF-JOB-NAME,19,8,CH                                         
 SMF-14-15-DSN,69,44,CH                                       
 SMF-17-DSN,44,44,CH                                          
 SMF-DATE,11,4,DT1                                            
 SMF-TIME,7,4,TM4                                             
//*                                                           
//SYSIN    DD   *                                             
  SORT FIELDS=(11,4,PD,A,7,4,PD,A)                            
  OUTREC IFTHEN=(WHEN=(SMF-RECORD-TYPE,EQ,14),                
              BUILD=(SMF-DATE,EDIT=(TTTT-TT-TT),              
                     C' AT ',                                 
                     SMF-TIME,EDIT=(TT:TT:TT.TT),             
                     C' ',                                    
                     SMF-14-15-DSN,                           
                     C' was opened by ',                      
                     SMF-JOB-NAME)),CONVERT

But this gives:

OUTREC IFTHEN=(WHEN=(5,1,BI,EQ,14),BUILD=(11,4,DT1,EDIT=(TTTT-TT-TT),C' AT ',7,4
,TM4,EDIT=(TT:TT:TT.TT),C' ',69,44,C' was opened by ',19,8)),CONVERT            
                                                            *                                  
WER268A  OUTREC STATEMENT  : SYNTAX ERROR

Leaving off the

,CONVERT

gives me :

WER235A  OUTREC   RDW NOT INCLUDED

Edit - latest update:

Just trying to isolate type 14 records, so current input is now:

//SYMNAMES DD   *       
 SMF-RECORD-TYPE,6,1,BI 
 SMF-JOB-NAME,11,8,CH   
 SMF-14-15-DSN,65,44,CH 
 SMF-17-DSN,44,44,CH    
 SMF-DATE,11,4,DT1      
 SMF-TIME,7,4,TM4       

SYSIN DD *
    SORT FIELDS=(11,4,PD,A,7,4,PD,A)                  
    OUTFIL IFTHEN=(WHEN=(SMF-RECORD-TYPE,EQ,14),      
                BUILD=(1,4,SMF-DATE,EDIT=(TTTT-TT-TT),
                       C' AT ',                       
                       SMF-TIME,EDIT=(TT:TT:TT.TT),   
                       C' ',                          
                       SMF-14-15-DSN,                 
                       C' was opened by ',            
                       SMF-JOB-NAME))

Solution

Yes, and it is fairly painless.

IFTHEN=(WHEN= allows various types of conditional process.

Here you can us the IFTHEN=(WHEN=(logicalexpression) to make a case/select/evaluate-type structure:

IFTHEN=(WHEN=(5,1,B,EQ,14),
         ...),
IFTHEN=(WHEN=(5,1,B,EQ,15),
         ...),
IFTHEN=(WHEN=NONE,
         ...)

WHEN=NONE is the "catch-all", for when none of the previous tests is true. IFTHEN=(WHEN=(logicalexpression) stops for the current record when one test is true. Even if a second condition on the current record were to be true, it would not get actioned. If you want two or more "hits" in IFTHEN=(WHEN=(logicalexpression) then you have to use HIT=NEXT at the end of each test where you may want to "pass it on" to the next test. Here, that isn't relevant, since it is the same field tested for a single value.

IFTHEN can appear on INREC, OUTREC, or OUTFIL. You have your processing on OUTREC, so you would have (although see my later comment):

OUTREC IFTHEN=(WHEN=(5,1,B,EQ,14),
                ...),
       IFTHEN=(WHEN=(5,1,B,EQ,15),
                ...),
       IFTHEN=(WHEN=NONE,
                ...)

BUILD, OVERLAY and PARSE can be used within IFTHEN.

Some thoughts and tips.

I am suspicious of your SUM FIELDS=NONE. This would drop any records with a duplicate key. Which of the records from the input which is retained depends. If you use OPTION EQUALS or EQUALS on the SORT (or MERGE) then the first record will always be retained. If you don't the record which is retained when the key is duplicate can vary from run to run. EQUALS has some impact on performance.

Anyway, I'm not sure why you have FIELDS=NONE it here. You can even get an "accidental" match across entirely different data sets.

If you are going to SORT and then select only part of the data (in OUTREC or OUTFIL), then always consider "cutting down" the record which is to be sorted, so that it only includes the data you will later use. When SORTing, the less data, the less time, memory and temporary storage is used.

Consider using DYNAM for temporary storage, and remove your SORTWKn DD names from the JCL (you only have one here, but...). Dynamic allocation of workspace means you don't have to think much at all about the workspace (unless you have huge datasets with widely variant record-lengths for the data) and you don't "overallocate".

SORT Symbols. Symbols allow you to name your data, so references to the same field can be done by name, and SORT looks after the less thrilling task of typing the start-position and length each time. It also reduces the amount of comments required, because the field already has a name, which you can make descriptive.

Symbols are defined in a separate data set (F/FB 80) with a SYMNAMES DD. The translated symbols (which also provide a record of what was used) are held in a SYMNOUT dataset, which is not required, but is useful.

SORT then applies the symbols to your control cards, and as well as showing your original source in the SYSOUT, shows you the translated cards.

Symbols for this task could be specified along these lines

SMF-RECORD-TYPE,5,1,BI
SMF-JOB-NAME,18,8,CH
SMF-14-15-DSN,68,44,CH
SMF-17-DSN,44,44,CH
SMF-DATE,11,4,DT1
SMF-TIME,7,4,TM4

Then you can replace the multiple definitions of the same field with the symbol, and let SORT do the work.

If you want to do selection on data sets, you can look at using the PARM and the special symbols JP0-JP9. Or hard-coding. Or generating the SORT control cards from a list of data sets, or by using JOINKEYS.

Oh, and I know that you know, but you are actually using SYNCSORT. DFSORT does not have CONVERT on OUTREC, but it does on OUTFIL. To be transportable, here simply change your OUTREC to OUTFIL.