Search code examples
iccube

icCube incremental vs single load performance


I'm using flat file data sources with the incremental load functionality and am seeing different performance depending on how I do the load. I have 3 datasets {d1,d2,d3} with d1 and d2 being the same size and d3 being 3 times larger. I am doing the following test on a machine with 16GB memory:

  1. Load d1 - time: 1m07s
  2. incrementally load d2 - time: 2m53s
  3. incrementally load d3 - runs out of memory

On the other hand, if I do a single load of d1+d2+d3, the total time is 5m29s and there are no memory issues.

Is this just a matter of memory overhead when doing incremental vs single load or should I be better managing the performance?


Solution

  • Incremental load has been implemented for supporting real time and it has not the same logic as a normal load.

    Additional data is pre-loaded into memory, that's why it takes more memory. During this pre-load the schema is still available, once the new data is fully preloaded and a first quality check is done, the schema is write locked and the actual load is done. This allows for having the schema locked for a few miliseconds.

    The incremental load is suitable for for real-time, 'small' amount of additional data, not really for your scenario.

    Slow times are not due to the fact you're running out of memory ( a lot of GC's ) ?

    Hope that helps.

    PS: If you need an additional support please contact support directly.