Search code examples
kdb

How to ensure a kdb partitioned table data is not duplicated?


I created a partitioned db that gets data from daily stored files. It works fine but I'm worried that kdb could add duplicate rows if I run the partition script twice with the same day files.

Should I use 'key' to check for the existence of partitioned tables or there is a simpler way to insure kdb will not duplicate the stored data?


Solution

  • I'm not sure if your issue is that 1) you're afraid you'll load the same file more than once, or 2) multiple files may contain the same data, so you don't want subsequent loads to create duplicates.

    For 1), if the daily stored files that you use to create the DB are not updated to and have unique names, you could possibly track which files have already been loaded and skip these on subsequent runs.

    2) Even though you cannot physically key the table, you probably have certain "key" columns, e.g. sym, date, time, side etc. You can check if the "key" values of the chunk you are currently loading already appear in the date partition. If they do, drop those records, while keeping the others.