Search code examples
dashdb

Why is loading dashDB analytics by trickle feed a bad idea?


I have a use case where I continuously need to trickle feed data into dashDB, however I have been informed that this is not optimal for dashDB.

Why is this not optimal? Is there a workaround?


Solution

  • Columnar warehouses are great for reads, but if you insert a single row into an N column table then the system has to cut the row into pieces and do N separate writes to disk. This makes small inserts relatively inefficient and things can slow down as a result.

    You may want to do an initial batch load of data. Currently the compression dictionary is built only for bulk loads, so if you start with a new table and populate it only using inserts then the data doesn't get compressed at all.

    Try to structure the loading into microbatches with a 2-5 minute load cycle.