Search code examples
timestampbulk-loadquestdb

Bulk out-of-order data import into QuestDB


I'm looking into using QuestDB for a large amount of financial trade data.

I have read and understood https://questdb.io/docs/guides/importing-data but my case is slightly different.

  • I have trade data for multiple instruments.
  • For each instrument, the microsecond-timestamped data spans several years.
  • The data for each instrument is in a separate CSV file.

My main use case is to query for globally time-ordered sequences of trades for arbitrary subsets of instruments. For clarity, the results of a query would look like

timestamp,instrument,...
0001,A,...
0003,B,...
0004,C,...
0004,A,...
0005,B,...

Hence I believe I need to put all of the trades in a single table with a designated timestamp, and an index on the instruments. I don't necessarily need any of the time series processing features of QuestDB right now, but I do need the sort by timestamp not to run out of memory.

My questions are (beyond whether this approach makes sense in general):

  1. Is there an efficient (and easy) way to merge the data into a single table, given that each CSV file spans years?

  2. Will I need to rebuild the table every time I add a new instrument in the future?


Solution

  • As of 6.0 you can simply append the CSVs to same table one by one given the table has designated timestamp and partitioned it will work.

    If your CSVs are huge I think batching them in transactions with few million rows will be better than offloading billions at once.

    Depending of how much data you have and your box memory you need to partition in a way that single partition fits memory several times. So you choose if you want daily or monthly partitions.

    Once you decide with partitioning you can speed up the upload if you able to upload day by day batches (or month by month) from all CSVs.

    You will not need to rebuild the table every time you add an instrument, table will be rewritten automatically partition by partition when you insert records out of order.