I have a DataStage (8.5) job that produces a CSV file as follows:
Key Date Amount
A 2014-07-24 $100
A 2014-07-23 $120
B 2014-07-24 $320
C 2014-07-24 $20
C 2014-07-23 $100
C 2014-07-22 $30
I would like to produce a new file with the header for each distinct Key to yield the following files from the above file:
Key Date Amount
A 2014-07-24 $100
A 2014-07-23 $120
Key Date Amount
B 2014-07-24 $320
Key Date Amount
C 2014-07-24 $20
C 2014-07-23 $100
C 2014-07-22 $30
The number of records for each key, and the key values themselves, will vary depending on when the job is executed.
Can this be accomplished in DataStage? If so, what would be the processing stages?
I used the Execute Command stage with the following awk statement, where key is $16:
awk -F "\",\"" 'NR==1 { hdr=$0; next } $16 != prev { prev=name=$16; gsub(/[^[:alnum:]_]/,"",name); $0 = hdr "\n" $0 } { print > ("/Directory/"name"_FILE.csv") }' /Directory/SOURCE_FILE.csv