How is encoded data read from a columnar database e.g. RLE encoded data in Vertica?

In the following table when Data in Dept field is encoded and stored, how does the Dept with value 10 know it has Age value of 38.

ID Dept Age

1 10 60

2 10 38

3 10 49

For a row store i understand that i check for ID 2 and i get the data for entire row.

But i am not able to wrap my head around as to how the data is retrieved for ID 2 when its in columnar store and Dept information is in encoded form.

Could you please help me understand if i am missing something obvious?

Solution

First, let's consider the columnar data in unencoded form:

ID with value 2 is the second element in the column, so to get the Dept and Age all that needs to be done is take their second element.

Now in encoded (compressed) form, you could look at the data in ranges. For example Dept could be encoded as 3*10, thus range 1-3 having a value of 10. To retrieve the second element in Dept, the database has to look for the range that holds the second element. That would be the range 1-3 which has a value of 10.

Or looked at in another way: to get Dept at position 4 in a sequential way, the system would see there is a group of 3 elements compacted into 1 value, so the item after that group (which could be a new group of identical values) would contain the value for the fourth element.

To speed things up, the system would of course keep an index with positions to be able to jump (semi)directly to the value of a certain range (and those ranges/groups would be stored in blocks, say 1-8 kB each - in a B+ Tree for example, especially when values need to be inserted later).

Another possibility would be to store those groups (range values with their rle prefix) in blocks, keep track of the first and (or only) last item index of each block, and decompress the block that contains the value for the index we're looking for. Then calculate the offset for the item in that decompressed block. This would depend on the type of compression used to store the data in that block.

On the other hand, in most cases we don't need all the data from a record, and that's why column based data stores are so interesting.

Let's consider a table with 1 billion records, 200 bytes each (string data etc.) and a system with 10 GB free RAM (I took the number of records big enough so the table won't fit in memory). So that's 200 GB of data.

Now let's say we want the total sum of a certain column (a 4 byte integer column). To calculate the sum on a record based table, we have to read the value every 200 bytes, and since data is read from disk in 4kB pages we have to read the full 200GB. On a normal disk at 100MB/sec that would take 2000 seconds.

If we have our data separated in columns, even with uncompressed data we would only have to read 4GB of data (which could already be in memory). If the column is compressed, say 10:1, that would only be 400MB. If the data is sparse (say a lot of zero or null values) that would be even less. Also to sum a range of 100 identical values all that needs to be done is 100*value instead of reading 100*200 bytes from disk, or even skip that range if they are zero.

There is also an extra speed gain because all that data (all values being close together) would be accessed from the L1 CPU cache, which is a lot faster than accessing main memory.