Search code examples
sql-serveralgorithmsql-server-2016columnstore

What is the underlying Storage and Search Algorithm for SQL Server Columnstore Indexes


I'm trying to figure out the guts of how Columnstore Indexes work within SQL Server. What I'm looking for is a technical reference guide or a whitepaper to the underlying storage and accompanying search algorithms for Columnstore Indexes, specifically regarding SQL 2016 (in case that differs from earlier versions). I don't even know if this algorithm/design has a formal academic name or not, as I've not found anything resembling one in the Microsoft documentation I've reviewed.

An equivalent of what I'm after regarding traditional rowstore indexes is that their underlying Storage and Search Algorithms are based on B+ Trees. The B+ Tree algorithm has a plethora of white papers out there to digest. The only algorithm reference I do see regarding Columnstore Indexes pertains to the DeltaStore functionality which is also based on B+ Trees.

I hope the underlying storage and search algorithm isn't proprietary and that my Google skills are just failing me, but if it turns out this is proprietary, knowing that would help quell my curiosity. Any help would be appreciated!


Solution

  • At this point, the best resource I've come across is The-Paper-Trail.org's blog post on columnar storage. It doesn't get into the details behind the search algorithms, but it has some great explanations behind the underlying storage as well as additional references to academic white-papers. If anyone else is interested in this stuff, I would highly recommend you review this page earlier rather than later.

    EDIT: Upon further reading, it looks like the "search algorithm" for Columnstore Indexes is basically a vanilla scan of the index, less any Rowstore Elimination and Column Elimination. The scan operation is made even more efficient by being executed in batch-mode against highly compressed data (due to the column-wise storage model) and depending on the query, aggregate and string predicate pushdown optimizations can further limit records pulled from disk. Columnstore indexes - query performance

    These two resources combined give a pretty good picture of what's going on under the covers, so if you're interested, give them a look. Finally, a word of advice; ignore or skip much of the literature published prior to the release of SQL 2016, because a lot of the underlying terms and logic have changed significantly over the past 3 versions of SQL Server, and I wouldn't recommend anyone use anything earlier than 2016 if you're going to use this feature.

    EDIT 2: I found an article from Microsoft confirming Columnstore Indexes are not B+ Trees.