I'm using flat file data sources with the incremental load functionality and am seeing different performance depending on how I do the load. I have 3 datasets {d1,d2,d3} with d1 and d2 being the same size and d3 being 3 times larger. I am doing the following test on a machine with 16GB memory:
On the other hand, if I do a single load of d1+d2+d3, the total time is 5m29s and there are no memory issues.
Is this just a matter of memory overhead when doing incremental vs single load or should I be better managing the performance?
Incremental load has been implemented for supporting real time and it has not the same logic as a normal load.
Additional data is pre-loaded into memory, that's why it takes more memory. During this pre-load the schema is still available, once the new data is fully preloaded and a first quality check is done, the schema is write locked and the actual load is done. This allows for having the schema locked for a few miliseconds.
The incremental load is suitable for for real-time, 'small' amount of additional data, not really for your scenario.
Slow times are not due to the fact you're running out of memory ( a lot of GC's ) ?
Hope that helps.
PS: If you need an additional support please contact support directly.