Search code examples
talend

How to dynamically read files defines by a list of paths with Talend?


First of all, I'm new with Talend. This is a mock project I do for training purpose.

Here's the context: I have a CSV file (simply called "CSV_IN") in which I have a bunch of path to different files containing information I need to retrieve. I can't change the Files, nor the File Tree, so I must retrieve the files with the paths set in the "CSV_IN".

So, in Talend Open Studio, I made something like this:

enter image description here

It seems that the "tJavaRows" are called for each rows of the CSV, giving an Iteration kind of output instead of a Flow. But if I try to use a trigger directly from here, it wait for all the rows to finish before sending only one trigger.

The thing is I need that FOR EACH paths read in the CSV_Input, a subJob will open the file pointed by the path and do some stuff (here I simply print the content, for now...)

So in the green section, I Iterate the output and send a "OnComponentOK" for each paths. The "tJava_1" does literally nothing else.

But when I run the Job, I get this:

enter image description here

The Blue SubJob run 4 time, which is the number of paths I have in the CSV_IN. But why is the content null?

If I print the context variable instead, I have my 4 paths, like I should.

I feel like the whole Job is too... MacGyver-ish... Is there a better way to do this?

EDIT : if I use a "tJavaRow" instead of a "tJava", I can use "input_row" to print the file. But I can't do what I want with a "tJavaRow"... Anyway, that's another problem for another time...

But the question remain: is this the "Proper" way of doing it?


Solution

  • You are on the right trail of thought here.

    First, you need to get the data out of the CSV file.

    Then, iterate over the rows and do whatever needs to be done in the job.

    What I think could be done in another way is using the tJava component and the onComponentOk trigger. Alternatively, I'd do it like this:

    inputCSV_1 --- iterate over file path / name --> inputCSV_2 (with data from the first inputCSV) -- row for doing your stuff --> write or end

    You can access the file names from the first inputCSV from within the row variable (e.g. row1.filename) and use it to open everything in the second inputCSV. No onComponentOk needed.