Search code examples
kettlepentaho-data-integrationpdi

PDI - Check data types of field


I'm trying to create a transformation read csv files and check data types for each field in that csv.

Like this : the standard field A should string(1) character and field B is integer/number.

And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.

I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.


Solution

  • You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.

    Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.

    In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.

    Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move

    And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like there. It is more flexible if you need maintenance on the long run.