I am running my Talend job in Windows Task Scheduler with interval of 15 minutes. The process is like exporting data from HBase into PostgreSQL. So when I'm running the task, the 2nd schedule reinserts the records again from 1st schedule and so on.
HBase schema -> id int, name string
PostgreSQL schema -> id int, name varchar(100),created index on (id) column.
Example :
schedule insert
1st schedule 2nd schedule
`id``name` `id` `name`
1 abcd 4 bbbb
2 efgh 5 cccc
3 hjkl 6 eeee
my output in POSTGRES : EXPECTED output :
afer scheduling
id name id name
1 abcd 1 abcd
2 efgh 2 efgh
3 hjkl 3 hjkl
1 abcd 4 bbbb
2 efgh 5 cccc
3 hjkl 6 eeee
4 bbbb
5 cccc
6 eeee
Thanks in advance !
You have to use your postgresql target table as a look up and check for the existing data. Your flow should be as below,
source --> Expression --> Target
Lookup(to check existing data)
Your flow should be as below,
Let me know if you need more assistance on this. This is a quick and easy task