Search code examples
sql-serverperformancet-sqltable-valued-parameters

SQL Query on single table-valued parameter slow on large input


I have a table with this simple definition:

CREATE TABLE Related 
(
    RelatedUser NVARCHAR(100) NOT NULL FOREIGN KEY REFERENCES User(Id),
    RelatedStory BIGINT NOT NULL FOREIGN KEY REFERENCES Story(Id),
    CreationTime DateTime NOT NULL,

    PRIMARY KEY(RelatedUser, RelatedStory)
);

with these indexes:

CREATE INDEX i_relateduserid 
    ON Related (RelatedUserId) INCLUDE (RelatedStory, CreationTime)

CREATE INDEX i_relatedstory 
    ON Related(RelatedStory) INCLUDE (RelatedUser, CreationTime)

And I need to query the table for all stories related to a list of UserIds, ordered by Creation Time, and then fetch only X and skip Y.

I have this stored procedure:

CREATE PROCEDURE GetStories
    @offset INT,
    @limit INT,
    @input UserIdInput READONLY
AS
BEGIN
    SELECT RelatedStory 
    FROM Related
    WHERE EXISTS (SELECT 1 FROM @input WHERE UID = RelatedUser)
    GROUP BY RelatedStory, CreationTime
    ORDER BY CreationTime DESC
    OFFSET @offset ROWS FETCH NEXT @limit ROWS ONLY;
END;

Using this User-Defined Table Type:

CREATE TYPE UserIdInput AS TABLE 
(
    UID nvarchar(100) PRIMARY KEY CLUSTERED
)

The table has 13 million rows, and gets me good results when using few userids as input, but very bad (30+ seconds) results when providing hundreds or a couple thousand userids as input. The main problem seems to be that it uses 63% of the effort on sorting.

What index am I missing? this seems to be a pretty straight forward query on a single table.


Solution

  • So I finally found a solution.

    While @srutzky had good suggestions of normalizing the tables by changing the NVARCHAR UserId to an Integer to minimize comparison cost, this was not what solved my problem. I will definitely do this at some point for the added theoretical performance, but I saw very little change in performance after implementing it right off the bat.

    @Paparazzi suggested I added an index for (RelatedStory, CreationTime), and that did not do what I needed either. The reason was, that I also needed to also index RelatedUser as that's the way the query goes, and it groups and orders by both CreationTime and RelatedStory, so all three are needed. So:

    CREATE INDEX i_idandtime ON Related (RelatedUser, CreationTime DESC, RelatedStory)
    

    solved my problem, bringing my unacceptable query times of 15+ seconds down to mostly 1-second or a couple of seconds querytimes.

    I think what gave me the revelation was @srutzky noting:

    Remember, "Include" columns are not used for sorting or comparisons, only for covering.

    which made me realize I needed all my groupby and orderby columns in the index.

    So while I can't mark either of the above posters post as the Answer, I'd like to sincerely thank them for their time.