What is the best method to load the incremental data into hive/impala table?
I followed the below steps but could n't succeed.
The above method is working fine when I run the oozie work flow sequentially. When I invoke multiple jobs at the same time its been hanging while loading the data.
I can't run the loading of data in sequence. Any help in making it more effective so that I can run parallel jobs which will load data at the same time.
In our case, the incremental data goes into a new partition in Hive table every time. So, in step 3 (in the above mentioned steps), we simply add a new partition to the table.
In case of multiple workflows working in parallel, if each of them loads data into a new partition, it should work fine.