Search code examples
sql-serverindexingcolumnstore

How columnstore index knows which data from one column are connected to data from other columns?


I am new to the use of columnstore index. The new different structure of columnstored data raised a question. How we know which data from one column1 (page1) are connected to other column2 (page2).

For example if we have the following representation of a table using traditional rowstore:

row1  1   2  3 -- page1
row2  4   5  6 -- page2

And for columnstore index:

col1  col2  col3
1      2     3
4      5     6

How we know using columnstore index which data are connected to who?


Solution

  • You are not totally getting rid of the relationship between the columns and their rows. The simplified difference is the way the table is stored. Traditional storage is physically stored in a row-wise manner while a columnstore is stored column-wise. The doc link provided here has much more info that I would prefer to not copy and paste.

    From the docs:

    Key terms and concepts These are key terms and concepts are associated with columnstore indexes.

    columnstore A columnstore is data that is logically organized as a table with rows and columns, and physically stored in a column-wise data format.

    rowstore A rowstore is data that is logically organized as a table with rows and columns, and then physically stored in a row-wise data format. This has been the traditional way to store relational table data. In SQL Server, rowstore refers to table where the underlying data storage format is a heap, a clustered index, or a memory-optimized table.

    So here is an example of how the relationship is retained in TSQL. Run this against a table that has a CS index (Disclaimer: I am not a pro at CS indexing):

    SELECT o.name AS table_,
    i.name AS index_, 
    i.type_desc AS index_type, 
    p.partition_number, 
    rg.row_group_id, 
    cs.column_id, 
    c.name AS column_
    
    
    FROM sys.objects o
    
    INNER JOIN sys.indexes i 
        ON i.object_id = o.object_id
    
    INNER JOIN sys.partitions p
        ON p.object_id = o.object_id
        AND i.index_id = p.index_id
    
    INNER JOIN sys.column_store_row_groups rg
        ON rg.object_id = o.object_id 
        AND i.index_id = rg.index_id
    
    INNER JOIN sys.column_store_segments cs
        ON cs.partition_id = p.partition_id
    
    INNER JOIN sys.columns c
        ON c.object_id = o.object_id
        AND c.column_id = cs.column_id
    
    
    WHERE o.object_id = OBJECT_ID(your_table_name)