Search code examples
kdbq

kdb+/q When to actually reload the hdb?


In most kdb systems its typical to reload the hdb (\l .) after any data persistence has occurred. With larger hdbs, this reload can sometimes be costly especially when there are multiple flat files.

With a standard TAQ database, what are some best practices for when to actually reload the hdb?

Some thoughts

  1. New date partition - no need to reload
  2. Splayed table changes - no need to reload. Tables are mapped when the executing process tries to access it
  3. New enumeration - no need to reload
  4. Updates to flat files - reload

Solution

  • Responding to your thoughts:

    • New date partition - no need to reload (not necessarily true)
    • Splayed table changes - no need to reload. Tables are mapped when the executing process tries to access it. (true - unless the table is brand new)
    • New enumeration - no need to reload (not true)
    • Updates to flat files - reload (true)

    My own thoughts:

    A simpler way to look at it might be to understand exactly what gets memory-mapped when a kdb database is loaded - anything else is loaded fully into memory and thus would require loads if/when the underlying data changes. So when kdb loads a database it maps the following:

    • the range of partitions available (most commonly date partitions). Assuming date partitioned, the range of possible dates is stored in memory as the date variable and also as .Q.PV. If a new date slice is added on disk after the database load then kdb isn't going to know about that date slice without a full reload.
    • the list of tables within the partitions, stored in-mem as .Q.pt. If a new table is added within a partition after initial load then kdb won't know about it without a full reload (or a hackier manual load which would probably not be recommended)
    • the splayed tables in the root of the database. Again, if a new splay is added after initial load then kdb won't know about it without a full reload (or a manual load of the individual splay).

    Everything else (flat tables, on-disk dictionaries, sym files) are loaded fully into memory and would thus require either a full reload or a reload of the individual objects if/when they change. Last point is that there are ways to force kdb to "see" new objects without database reload - involving modifying variables like .Q.pt, .Q.cn, .Q.PV, date etc but they're an undocumented and murky area