I have a job in Talend. The target of that job its to transform 18 tables (with 2 millions record each) into one. The 18 tables share 2 columns.
So, while working I found these problems:
1) I cannot complete the job when connecting the 18 tables at once. (memory error) My job look like this, but with much more connections :
2) I tried to connect only the half, but it last forever (8 hours and still counting) -not efficient at all!- :
3) I tried to split the job into several small ones, but still without success. I am stuck here.
Any recommendation of how optimize this job?
Thanks so much for reading and double thanks for answering.
You can easily have dozens of lookups in your job, if you optimize the way they're used. You can do the following in order to optimize your job :
Instead of having a single tMap
with lots of lookups, you can split it in multiple tMap
s like this :
lkp_1 lkp_2 lkp_3 lkp_y
| | | |
Source --- tMap_1 --- tMap_2 --- tMap_3 ---... --- tMap_x --- target
This is not mandatory but this way you can easily modify your lookups.
Next, in order to optimize memory use, you can take a look the "reload at each row
" option of tMap
. Instead of using the default "load at once
" which loads your lookup table in memory, you can use "reload at each row
" in order to executre the lookup query for the current row :
In your lookup query, you have access to a global variable defined in your tMap, for instance : (Integer)globalMap.get("myLookupKey")
, in order to filter your data on the database side and only return the value that matches your lookup key.
Here is a detailed example.
There's also the option of "Store temp data" for lookups, which optimizes memory usage, as data from lookup tables is stored on disk instead of memory.