Search code examples
kdbq-langtorq

TorQ: How to update disk database populated with .loader.loadallfiles?


I populate a disk database from large CSV files using TorQ's .loader.loadallfiles in a cumulative fashion and it works great. However, I now need to also append data coming from a streaming source and I'm not sure what's the best way to go.

I know how to update or append data to the in-memory database. However, I do not know what API there is to cosistently bring the delta updates to the disk database previously populated with .loader.loadallfiles?

I call .loader.loadallfiles e.g.

rawdatadir:hsym `$("" sv (getenv[`KDBRAWDATA]; "fwdcurve"));
.loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`partitiontype!(`date`ccypair`ftype;"ZSS";enlist ",";`fwdcurve;target;`date;`month); rawdatadir];

Solution

  • The best idea as Jonathon commented is to maintain an RDB for storing the data from your streaming source. When Kdb saves data to disk it saves entire columns in one go, so given 1000 records with 5 columns it is better to ask it to save 5 lists 1000 entries long than to ask it to save 5 columns each with one entry 1000 times.

    To illustrate the amount of time this takes, suppose I have two on disk lists x and y. Upserting 10000 elements at once is very fast

    q)\t `:x upsert 10000#1
    0
    

    Doing them one at a time is much slower

    q)\t:10000 `:y upsert 1
    126
    

    It might be worth looking into using the full TorQ framework. Its designed specifically for this kind of situation. It has RDB and HDB functionality and can be found here http://aquaqanalytics.github.io/TorQ/

    If you wish to append data like you're saying then there currently isn't any API to do that. What you can do is modify the RDB or WDB to write to append to the database. Using .loader.writedatapartition followed by calling .loader.finish will be helpful I think.