Search code examples
kdb

Why can't keyed table be splayed in kdb?


Keyed tables are nothing but dictionary mapping of two tables like:

q)kts:([] sym:`GOOG`AMZN`FB)!([] px:3?10.; size:3?100000)
q).Q.dpft[`:/path/db;.z.d;`id;`kts]
    'nyi
    [0]  .Q.dpft[`:/path/db;.z.d;`id;`kts]

Why is there is a limitation that keyed tables cannot be splayed or partitioned?


Solution

  • I think the simplest answer comes from both the technical and the logical.

    Technical: there is no way in the on-disk format to indicate this currently. The .d file indicates the order of columns on disk but not any further metadata. This could be changed at a later date in theory.

    The logical answer comes from the size of the data in question. Splayed tables are typically used when you want to hold a few columns in memory. A decade ago this meant that splayed tables were useful for holding up to 100M rows but with 3.x and modern memory that upper limit can be well north of 250M. I don't think there's a good way to make that kind of join performant in ad-hoc calculation. The grouped attribute index supported to make that work is around the same size as the column on disk and would need to be constantly re-written as data is appended.

    I think the use of 'nyi in this case, to mean, "we probably need to think about this one for a bit", is appropriate.

    The obvious solution is to look at explicit row relationships via linking columns, where the lookup calculation is done ahead of time.