Search code examples
sqlsnowflake-cloud-data-platformdata-warehouse

Search Optimization Service in Snowflake is not used


I have a Question on Search Optimization Service on Snowflake:

I have two tables Store_sales and Customers in my db and both are enabled with search optimization service, i have a query which does filter on date column in store_sales.date column as below by joining customer data:

select ss.SS_SOLD_DATE_SK, ss.SS_ITEM_SK, c.C_FIRST_NAME from STORE_SALES ss
join SS_CUSTOMER c 
on ss.SS_CUSTOMER_SK = c.C_CUSTOMER_SK  
where ss.SS_SOLD_DATE_SK = 2451148;

In the execution plan i am seeing Search optimization service used while scanning STORE_SALES table whereas Search Optimization is not used on SS_CUSTOMER table, due to this full table scan query is slowing down, similarly if i have point lookup filter on customer and querying STORE_SALES search optimization on STORE_SALES is not used. How to use search optimization service which is ON both tables to be used on both the tables?

Below is execution plan:

enter image description here


Solution

  • How Search Optimization Affects Joins

    The search optimization service does not directly improve the performance of joins. However, it can improve the performance of filtering rows from either table prior to the join, if the table has search optimization enabled and the predicate is selective.

    Both tables do not need to have search optimization enabled. The decision to use search optimization is made for each table independently.

    The above is is taken from the documentation where I have added my own bold.

    So it would seem that the query optimizer/planner has decided that SO is not needed for the customer table in this instance. Unfortunately I didn't see anything about how (or if) it's possible to force it to use SO, nor anthing about how it decides whether or not to use SO. I have read guidlines about the the number of distinct values in the query predicate being large: at least one of the columns accessed through the query filter operation has at least 100k-200k distinct values.