Search code examples
databasevertica

In this example, how columnar database is faster compared to RDBMS


In this example, how is a columnar database like Vertica better than an rdbmsw.r.t speed?

I am assuming that Symbol and date are primary key w.r.t rdbms

enter image description here

From the example I am following, the projection looks like the following but I can't figure out why the no. of reads is less

enter image description here


Solution

  • If tickstore only has the three columns you are using in your example (and practically only then), the row-oriented DBMS might be faster - especially if you have symbol and date indexed.

    But in any other case: each column in a columnar database (and Vertica was the first widely known one) boils down to a file by itself. So , in the case of 103 columns in the table, the row-oriented DBMS reads 100 columns too many from disk, while Vertica reads only 3; the row-oriented DBMS needs to navigate through the complete read row to find the starts of each column needed in the query; and Vertica can rely on the fact that each column-file contains exactly one data type, and can optimise both reading and writing processes to take advantage of that knowledge.

    Plus there are encoding algorithms for column-oriented data that would blow up the space of this answer to go into detail for here.

    To sum it up: the only thing a column oriented database does not like is a SELECT * of a 100-plus column table.