Search code examples
sortingetldatastage

Datastage Sortstage cluster key change vs key change


According to IBM's documentation for Sort Stage:

Create cluster key change column

This property appears for sort type DataStage and is optional. It is set False by default. If set True it tells the Sort stage to create the column clusterKeyChange in each output record. The clusterKeyChange column is set to 1 for the first record in each group where groups are defined by using a Sort Key Mode of Don't Sort (Previously Sorted) or Don't Sort (Previously Grouped). Subsequent records in the group have the clusterKeyChange column set to 0.

Create key change column

This property appears for sort type DataStage and is optional. It is set False by default. If set True it tells the Sort stage to create the column KeyChange in each output record. The KeyChange column is set to 1 for the first record in each group where the value of the sort key changes. Subsequent records in the group have the KeyChange column set to 0.

Those two options sound very similar to me.

  1. When should I use which option?

  2. How does the end result look like?


Solution

    1. You should use the clusterKeyChange if you got Don't Sort columns and you want them to be considered - like when you grouped / sorted it before. The Key change column is more often used.

    2. The result is an additional column showing 0 or 1 - 1 in case of a change - so the first row of a series of identical key vaues is marked to do some extra logic usually in a transformer.

    Hint: Note that there is a transformer functionality that could be very useful as well LastRowInGroup which lets you react (i.E. reset values etc.) before you will receive a new value in the transformer. So this function might replace the sort stage option above but it needs sorted input to the transformer.