I have the dataset:
What I need is have all accounts for each concat group in one field, separated by a comma. I was able to achieve it with de-normalizer and then, some regex. It is just fine when you have a few accounts but now I have a case with more that 10K accounts. How can I achieve it?
Both the row denormaliser as well as the field concat step (which would achieve the second objective for this task) do not allow dynamic field names, as far as I can tell. So one unorthodox solution for dealing with a large number of possible values in the denormalisation and concatenation would be to simply specify them all. For example, a field in the denormaliser step is defined as
<field>
<field_name/>
<key_value/>
<target_name>field_1</target_name>
<target_type>None</target_type>
<target_format/>
<target_length>-1</target_length>
<target_precision>-1</target_precision>
<target_decimal_symbol/>
<target_grouping_symbol/>
<target_currency_symbol/>
<target_null_string/>
<target_aggregation_type>-</target_aggregation_type>
</field>
So you could write a script printing the template for all fields and insert it into the place of the <fields>
tag in the transformation's XML.
Note: This is not fit for production. This is a solution if you need to do a task once or maybe twice to import some data. I wouldn't want to have to deal with a ETL process where this was deployed. A proper solution probably involves a custom step or an external script. I will gladly be proven wrong on this.