Search code examples
etldatastage

Splitting DataStage Output based on Key Column


I have a DataStage (8.5) job that produces a CSV file as follows:

Key    Date        Amount
A      2014-07-24  $100
A      2014-07-23  $120
B      2014-07-24  $320
C      2014-07-24  $20
C      2014-07-23  $100
C      2014-07-22  $30

I would like to produce a new file with the header for each distinct Key to yield the following files from the above file:

Key    Date        Amount
A      2014-07-24  $100
A      2014-07-23  $120

Key    Date        Amount
B      2014-07-24  $320

Key    Date        Amount
C      2014-07-24  $20
C      2014-07-23  $100
C      2014-07-22  $30

The number of records for each key, and the key values themselves, will vary depending on when the job is executed.

Can this be accomplished in DataStage? If so, what would be the processing stages?


Solution

  • I used the Execute Command stage with the following awk statement, where key is $16:

    awk -F "\",\"" 'NR==1 { hdr=$0; next } $16 != prev { prev=name=$16; gsub(/[^[:alnum:]_]/,"",name); $0 = hdr "\n" $0 } { print > ("/Directory/"name"_FILE.csv") }' /Directory/SOURCE_FILE.csv