Search code examples
sql-serverindexingquery-optimizationazure-sql-managed-instancecomposite-index

Composite index on three columns


We have got 93M rows mapping table which is keeping mapping information of three tables and corresponding three tables. We are facing performance issues in accessing data from the mapping table.

TableName PK information Number of Rows
Seller SellerId Primary Key 3000 rows
Store StoreId Primary Key 20000 rows
Product ProductId Primary Key 200k rows
SellerStoreProductMapping SellerId,ProductId,StoreId Composite non-clustered index. There is also one surrogate key in this table: SellerStoreProductMappingId which is used as Primary Key clustered 93M rows

Our queries can access any of the three combination : Seller, Product, Store in the 93M rows table.

My actual query is like this:

SELECT < many columns from four tables>
FROM SellerStoreProductMapping
INNER JOIN Store
INNER JOIN Seller
INNER JOIN Product ...
WHERE SellerId = 123

But, what is happening is the non-clustered index: SellerId,ProductId,StoreId is rarely used, in our queries, even if we are having filter on SellerId. It goes for index scan for storeId.

Our doubt is, for these three column combination,

  1. should we go for separate non-clustered indexes (3 indexes) ?
  2. should we go for two column non-clustered composite indexes (4 indexes) ?
  3. should we go for three column composite non-clustered indexes (9 indexes). Instead of 9, We will limit to specific usage scenarios. ?

Note: We cannot create clustered column store index, as we are having ROWVERSION datatype as one of the datatypes in the mapping table.


Solution

  • Your index will be used systematically if this index is covering all the information in the query and if predicate is sargable.

    As an example, let us see those queries :

    --1
    SELECT *
    FROM   SellerStoreProductMapping
    WHERE  Seller = 1 AND Product = 2 AND Store = 1
    -- 2
    SELECT Seller, Product, Store
    FROM   SellerStoreProductMapping
    WHERE  Seller = 1 AND Product = 2 AND Store = 1
    -- 3
    SELECT anyOtherColumns
    FROM   SellerStoreProductMapping
    WHERE  Seller = 1 AND Product = 2 AND Store = 1
    -- 4
    SELECT Seller, Product, Store
    FROM   SellerStoreProductMapping
    WHERE  Seller = 1 AND Product = 2 AND Store = 1
    ORDER  BY anyOtherColumns
    

    Only query 2 will systematicaly use the index. All the other queries (1, 3, 4) does not have all the columns used in the query, in the index key... So they must use a double read :

    • first read (seek in the index) to find candidate rows that are qualified under the predicate
    • second read in the table to find the columns that the index does not have

    The cost of the two reads is compare to the cost of other strategies like scanning the table. If the cost of the scan is lower, the index won't be used...