In this example, how is a columnar
database like Vertica
better than an rdbms
w.r.t speed?
I am assuming that Symbol
and date
are primary key
w.r.t rdbms
From the example I am following, the projection looks like the following but I can't figure out why the no. of reads is less
If tickstore only has the three columns you are using in your example (and practically only then), the row-oriented DBMS might be faster - especially if you have symbol
and date
indexed.
But in any other case: each column in a columnar database (and Vertica was the first widely known one) boils down to a file by itself. So , in the case of 103 columns in the table, the row-oriented DBMS reads 100 columns too many from disk, while Vertica reads only 3; the row-oriented DBMS needs to navigate through the complete read row to find the starts of each column needed in the query; and Vertica can rely on the fact that each column-file contains exactly one data type, and can optimise both reading and writing processes to take advantage of that knowledge.
Plus there are encoding algorithms for column-oriented data that would blow up the space of this answer to go into detail for here.
To sum it up: the only thing a column oriented database does not like is a SELECT *
of a 100-plus column table.