Input file Layout: 01 to 10 - 10 Digit Acct# 53 to 01 - An indicator with values 'Y' or 'N' 71 to 10 - Time stamp (Rest of the fields are insignificant for this sort)
While sorting the input file by splitting and eliminating duplicates in two ways result in different results. I wanna know why?
Casei: Splitting and Eliminating duplicates in the same step.
SORT FIELDS=(01,10,CH,A,53,01,CH,A)
SUM FIELDS=NONE
OUTFIL FILES=01,
INCLUDE=(53,01,CH,C'Y',AND,71,10,CH,GT,&DATE2(-)),
OUTFIL FILES=02,
INCLUDE=(53,01,CH,C'N',AND,71,10,CH,GT,&DATE2(-)),
Caseii: Splitting and eliminating duplicates in two different steps:
STEP:01
SORT FIELDS=(01,10,CH,A,53,01,CH,A)
SUM FIELDS=NONE
STEP:02
SORT FIELDS=COPY
OUTFIL FILES=01,
INCLUDE=(53,01,CH,C'Y',AND,71,10,CH,GT,&DATE2(-)),
OUTFIL FILES=02,
INCLUDE=(53,01,CH,C'N',AND,71,10,CH,GT,&DATE2(-)),
These two steps are resulting different output. Do u see any difference between both cases? Please clarify.
You are asking to sort on an Account Number (10 characters ascending) then on an Indicator (1 character ascending). These two fields alone determine the key of the record - Timestamp is not part of the sort key. Consequently if there are two or more records with the same key they could be placed in any (random) order by the sort. No telling what order the Timestamp values will appear.
Keeping the above in mind, consider what happens when you have two records with the same key but different Timestamp values. One of these Timestamp values meets the given INCLUDE criteria and the other one doesn't. The SUM FIELDS=NONE parameter is asking to remove duplicates based on the key. It does this by grouping all of the records with the same key together and then selecting the last one in the group. Since key does not include the Timestamp the choosen record is essentially a random event. Consequently it is unpredictable as to whether you get the record that meets the subsequent INCLUDE condition.
There are a couple of ways to fix this:
A stable sort causes records having the same sort key to maintain their same relative positions after the sort. This will preserve the original order of the Timestamp values in your file given the same key. When the removal of duplicates occurs DFSORT will choose the last record from the set of duplicates. This should bring the predicability to the duplicate elimination process you are looking for. Specify a stable sort by adding an OPTIONS EQUALS control card before the SORT card.
EDIT Comment: ...picks the VERY FIRST record
The book I based my original answer on clearly stated the last record in a group of records with the same key would be selected when SUM=NONE is specified. However, it is always best to consult the vendors own manuals. IBM's DFSORT Application Programming Guide only states that one record with each key will be selected. However, it also has the following note:
The FIRST operand of ICETOOL's SELECT operator can be used to perform the same function as SUM FIELDS=NONE with OPTION EQUALS. Additionally, SELECT's FIRSTDUP, ALLDUPS, NODUPS, HIGHER(x), LOWER(y), EQUAL(v), LASTDUP, and LAST operands can be used to select records based on other criteria related to duplicate and non-duplicate keys. SELECT's DISCARD(savedd) operand can be used to save the records discarded by FIRST, FIRSTDUP, ALLDUPS, NODUPS, HIGHER(x), LOWER(y), EQUAL(v), LASTDUP, or LAST. See SELECT Operator for complete details on the SELECT operator.
Based on this information I would suggest using ICETOOL's SELECT operator to select the correct record.
Sorry for the misinformation.