Search code examples
pentahokettle

Pentaho kettle Get files from SFPT within a transformation


I need to read the source files to be processed from FTP location. For each Department one specific FTP folder is assigned. I would like to do like this

(1) Get list of all Dept ID and pass it row by row to a Job (2) In that job, get the access credentials for current Dept ID and put into variables (3) Access files from Dept specific FTP, process it and put back processed file into FTP

In my kettle version (CE 5.0.1) I didn't got how to get files from SFTP within a trasformation. There is a step available at Job level. If there is a step at transformation, I can pass the access credentials from Get Variables setp, so that for all the Dept IDs, it can work out.

Please guide me how can be done this?


Solution

  • Two approaches:

    Option A (recommended):

    • Parent job calls a sub-job, executing the child once per row;
    • Sub-job: gets file via sftp and passes file to transformation;
    • Transformation: reads one file.

    Option B (experimental):

    From PDI 5 onwards there's a transformation executor step and a job executor step which can be called from within a transformation. Their purpose is pretty much allowing a simpler iteration model for this type of task.