Search code examples
datastage

How can I aggregate string data in DataStage?


I have the following data coming into DataStage:

customernumber hometelephone mobiletelephone
1234 NULL 07123456
1234 0120202 NULL

What I want out the other end is:

customernumber hometelephone mobiletelephone
1234 0120202 07123456

Simple I thought, just use an Aggregator - and indeed this does work to some degree. However, it converts the strings to floats, then back to strings again, resulting in my phone numbers becoming 7.1235e+06 (or similar).

Is there some other way to take the maximum value of a string?

What I essentially want would be easy in SQL:

SELECT customernumber, MAX(hometelephone), MAX(mobiletelephone) FROM <input> GROUP BY customernumber;


Solution

  • There are two options for this problem:

    • Aggregator stage: Use the property "Default to decimal output" to get a decimal output. Details can be found here. There is nothing wrong in converting data types.
    • Transformer: Aggregation cans also be done in a Transfomer stage by using LastRowInGroup to detect the group change.