Search code examples
sqlsql-serverfull-text-searchazure-sql-databasestar-schema

Full text search with CONTAINS is very slow


We try to use Full text search on Azure database and got performance problems on using CONTAINS search.

Our data has star schema, Fact table has clustered column store index enabled and around 40 million rows. Below is how we use CONTAINS on dimension and do aggregation on Fact table on different queries:

Query 1 using EXISTS:

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f

WHERE EXISTS (
        SELECT * FROM [SPENDBY].[DimCompanyCode] d

        WHERE f.[FK_DimCompanyCodeId] = d.Id
        AND CONTAINS(d.*, 'Comcast'))

GROUP BY f.[FK_DimCompanyCodeId]

ORDER BY SUM(f.NetValueInUSD) DESC

This query seems run forever and never return the result.

There is non-clustered index on the foreign key FK_DimCompanyCodeId] and there is only one row returned when searching Comcast:

SELECT id  FROM [SPENDBY].[DimCompanyCode] d
WHERE CONTAINS(d.*, 'Comcast');
-- will return id = 5

And there are around 27 million rows of Fact table which has FK_DimCompanyCodeId = 5.

Query 2 using INNER JOIN:

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f

INNER JOIN [SPENDBY].[DimCompanyCode] d ON (f.[FK_DimCompanyCodeId] = d.Id)
WHERE CONTAINS(d.*, 'Comcast')

GROUP BY f.[FK_DimCompanyCodeId]
ORDER BY SUM(f.NetValueInUSD) DESC

This query seems run forever and never return the result as well.

Query 3 using #temp table:

SELECT id INTO #temp FROM [SPENDBY].[DimCompanyCode] d
WHERE CONTAINS(d.*, 'Comcast');

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f

WHERE EXISTS (
        SELECT * FROM #temp
        WHERE f.[FK_DimCompanyCodeId] = #temp.Id)

GROUP BY f.[FK_DimCompanyCodeId]

ORDER BY SUM(f.NetValueInUSD) DESC

Very fast, returns the result after 5 seconds.

Why full text search is so slow for in case 1 and case 2.


Solution

  • Eventually, I figured out CONTAINS works well on specific column (Description for example):

    SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
    FROM [SPENDBY].[FactInvoiceDetail] f
    WHERE  f.[FK_DimCompanyCodeId] IN  (
            SELECT d.Id FROM [SPENDBY].[DimCompanyCode] d
            WHERE CONTAINS(d.[Description], 'Comcast')
    )
    GROUP BY f.[FK_DimCompanyCodeId]
    ORDER BY SUM(f.NetValueInUSD) DESC
    

    In order to search for the whole table, CONTAINSTABLE will have the best performance and avoid using #temp table:

    SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
    FROM [SPENDBY].[FactInvoiceDetail] f
    LEFT OUTER JOIN CONTAINSTABLE([SPENDBY].[DimCompanyCode], *, '"Comcast"') ct 
    ON f.[FK_DimCompanyCodeId] = ct.[Key]
    WHERE ct.[Key] IS NOT NULL
    GROUP BY f.[FK_DimCompanyCodeId]
    ORDER BY SUM(f.NetValueInUSD) DESC