Search code examples
kettlepentaho-data-integrationpdi

PDI - how to skip file that have already processed?


Look at my jobs and transformation below :

Jobs Transformation

I want to process files from FTP and Shared folder. My team will put CSV files in there every day if there are a new one. Files in FTP and Shared Folder will hold until 7 days old before being removed.

My question is if last day I have already process A.csv, B.csv and then today I want process only C.csv without A.csv, B.csv even the file still in the same folder and I don't want move or delete file that have already processed. How I do that?


Solution

  • Better create one table and store the processed file name. Add step to check the file name exists in that table or not. if not exists then process the file otherwise skip the file