Search code examples
sql-serverindexingcolumnstore

What is a columnstore index and how is different from clustered and non-clustered indexes?


I am confused about the columnstore index.

What is a columnstore index, and how it is different from clustered and non-clustered indexes?


Solution

  • Assume you have a table like below with col1 as primary key:

    col1 (PK) col2 col3
    1 2 3
    4 5 6

    Normal indexes will be stored "row-wise" (per-row), so all the columns of a single given row reside on a single page (assuming a page can hold only one row):

    Page col1 col2 col3
    page1 1 2 3
    page2 4 5 6

    So when you want to read something across multiple rows, e.g. with SUM(col3), SQL Server will need to read both page 1 and page 2 in order to deliver, which is a total "cost" of two pages read in this example.

    With columnstore indexes, the same data will be stored "column-wise", in per-column pages instead:

    Page row1 row2
    page1 1 4
    page2 2 5
    page3 3 6

    This means that if you want to do that same SUM(col3) from before, SQL Server now has to read just one page (page 3), rather than two pages as before.

    Columnstore indexes provide a clear benefit for column-centric queries because they allow the processing engine to make fewer reads of data that it doesn't need, using less memory and I/O.

    As a result, in large tables where such queries are run frequently, the performance benefit of an appropriate columnstore index can be enormous (up to 10x performance and data compression gains according to the Microsoft SQL docs).