Search code examples
pentahopentaho-spoonpentaho-data-integration

How to generate output based on variable in a single spoon transformation?


I am new to pentaho. I am stuck with one issue.

I have a spoon transformation. I have an input file with data. Based on the input file, an output file need to be generated. My issue is how to generate output file with different file names based on input file.

Eg: I have an input file which has a field country( say USA). I need to generate an output xls file with this country at the end i.e. USA.xls For that what I did is that I mapped the country field to a variable

${COUNTRY_NAME}

, so that I can use this variable while generating the output file name. But it is not working as expected.

I need to run this transformation multiple times based on input file. Each input file have country field with different country names. So I need to generate corresponding excel for each country.

Eg: In my first run I used a file with USA. now ${COUNTRY_NAME} is USA. The output file generated name ends with USA.xls. So far so good. But when I run the transformation second time with AUS, the ${COUNTRY_NAME} is still USA. The output file generated name ends with USA.xls not AUS.xls.

I have only one transformation. Please help me to generate xls file with different name based on input file.

Thanks in advance


Solution

  • You cannot do that in a single transformation. The reason is that in a transformation all the steps run in parallel, so when it comes to writing to your output files, the data for US and AUS are still mixed in the processing pipeline.

    When you need to do time related stuff, use a job. Basically you take your transformation as is, filtering the data based on the ${country} and putting it in a file named filename${country}.xls (yes you can, strings get concatenated).

    The variable ${country} is defined in another transformation which reads your data, keep a Unique row by country, and Set Variables.

    Then you make a job that reads the second transfo (define ${country}) and chain to the second (produce filename${country}.xls).

    You have a sample folder, shipped with your PDI, that sits in the same directory as your spoon.bat/spoon.sh. The sample/job/run_all/Run all sample transformations.kjb does almost the same as what you want to achieve.

    I know, the first reaction is that it seams daunting for so simple a task. But when you will get used to the reasoning, you'll discover all the benefits of being able to control parallel processing in so simple a way.