Search code examples
kdb

Get subset of a large tplog file without loading the complete tplog file in memory in kdb


I have a tplog file of few hundred GBs which consists of logs of many tables - 'trade', 'quote' etc.
I want to create tplog file which consists of records of trade table into a new tplog file(tradeTpLog) on the disk.
Since my tplog file is big therefore it is not possible to load the complete tplog file at once so I was thinking of reading a single record/chunck of records from tplog file in memory then checking if the table is 'trade' and if yes, then appending the record to the tradeTpLog file on disk.

Test TpLog file records:

2#get `:./sym2020.05.13
((`upd;`trade;(20:46:39.781823000 20:46:39.781823000;`GS.N`BA.N;178.5163025 128.0462196;798 627j));(`upd;`quote;(20:46:39.782805000 20:46:39.782805000;`IBM.N`VOD.L;191.0897744 341.2843914;191.1130483 341.3052296;564 807j;886 262j)))

I am aware of -11! to which we can provide n elements but not sure if/how it can be used in this case.

Unsuccessful attempt:

{if[`trade~x@1;`:./tradeFile upsert x]}@'25#get `:./sym2020.05.13

Solution

  • You can't selectively extract rows from the log, but you can just update the definition for upd. If you want to make a new logfile you can't use upsert, you will need to create a new log file

    // Preserve the original upd functionality
    upd_old::upd;
    // Create a new log file and a handle to use
    `:tradeLog set ();
    h::hopen `:tradeLog;
    // Define a new upd 
    upd:{[t;x] if[t=`trade;h enlist (`upd;t;x)]};
    // Replay the log file
    -11!`:./sym2020.05.13;
    // Revert the upd functionality
    upd::upd_old
    

    Using -11! will be much faster than get.

    If you want to, you can then choose to replay n elements via -11!(n;x)