Search code examples
normalizationtransformationetlpentahokettle

how to normalize the records? e.g. several similar columns into rows


I am using Pentaho Kettle and thinking on way to normalize my flat file (csv) data. Eventually store it to database.

csv structure: item name, store1 sold quantity, store2 sold quantity, store...

expected result: item name, store name, sold quantity

Any guidance is appreciated.


Solution

  • You can do this with the Row Normalizer step as long as the number of stores is fixed or at least has a maximum. If it's variable, you'll have to use a JavaScript step, or a UDJC. See the docs for how to use these steps:

    PDI Transform Steps

    If it's variable, I'd consider preprocessing the file before loading. I've done this with Python and it works pretty well.