Search code examples
databasesymbolskdb+

Does the sym column in a table play a special role?


Often in kdb+ setups I see that sym is a column, which is very non-descriptive of what the data actually is; in finance this will often be ticker for example, in IoT/robotics you might want sensor etc. In the prototypical capture setup you will see date time sym as the first three columns; the db is partitioned on date and ordered by time ("date" as a name is derived from the partition type, "time" can be anything although the obvious choice). My question is why does sym play a similarly prominent/important role as the other two in code/literature?


Solution

  • Part of the importance of the sym column is actually in how data is often stored on disk. While partitioned databases are often parted by date, it is actually quite common to sort tables within a date partition by sym, instead of time. When sorted by sym instead of time, you can apply the parted attribute to the sym column, which dramatically increases querying speed of historical databases if you are filtering by sym.

    As I understand it the use of the word 'sym' is just because it's an abbreviation of symbol. It's prevalence is partially just convention, however it is also used as a reference to a particularly important aspect of kdb, namely the sym file. When saving splayed or partitioned tables to disk, it is necessary to enumerate any data of type `symbol. This will allow sym data to be saved to disk in a fixed-length format (of integers) rather than as variable length symbols. This also helps to massively increase querying speed when data is sorted by sym. The issue with enumeration is that it will be necessary to retain a map of what integer corresponds to what symbol, which is the all important sym file.

    Another point to note is that many kdb+tick architectures, such as the vanilla tick set-up available from Kx, enforce that the first two columns of every table be time and sym. .u.sub, the function called by a kdb process to subscribe to a feed via a tickerplant, allows a subscriber to filter the data they want to receive by sym, which necessitates the presence of a column called sym. If you wanted a db without a sym column, you'd have to make a number of modifications to your architecture.