Search code examples
sql-serverdatabasequery-optimization

Reduce the size of a table with a multi-column index


I have a table with only 4 columns and over 1 billion rows. Suppose these columns are named a, b, c, and d, my application requires filtering of data based on (a, b, c) or (a, b). To accommodate this, I created an index on the columns (a, b, c) in that order. However, this resulted in the table size doubling. I guessed the reason is because the a, b, and c columns are added to both data and indexes. Can anyone suggest any solutions to reduce the size of this table?

My table schema:

CREATE TABLE Message (
   userId bigint NOT NULL,
   campaignId int NOT NULL,
   notificationId int NOT NULL,
   isOpened bit NOT NULL
);

I need to filter data by (userId, campaignId, notificationId) and (userId, campaignId)


Solution

  • Assuming you don't already have a clustered index on the table, create the index as clustered so that the index leaf nodes are the data rows. This will avoid storing the key and included columns redundantly. Also specify UNIQUE if the values are unique to improve execution plan quality.

    CREATE CLUSTERED INDEX cdx_YourTable ON dbo.YourTable(a, b, c);
    

    You can reduce size further with PAGE or ROW compression like below. Although this will incur additional CPU overhead, the cost is often more than offset with less IO and improved buffer cache efficiency for the same data.

    CREATE CLUSTERED INDEX cdx_YourTable ON dbo.YourTable(a, b, c)
    WITH(DATA_COMPRESSION=PAGE);