Search code examples
sqlsql-servert-sqlsql-server-2014-express

Extremely slow SELECT statement with WHERE on a FK field


I have this query below, and it's extremely slow. It takes almost 2 minutes for run to return 3,008 records out of a table with 99 million records. The first query where it gets "Article" data is super fast, less than 1 second and always returns 1 record. It's the second query that's the problem. I don't really want to JOIN these queries. The first one is so quick, and (in my real query) I'm setting more than just @ArticleID for further use.

The query execution plan says it has 75% for it on a clustered key lookup on IX_Name, which didn't make sense to me because I'm not even doing anything with name fields here. Furthermore, Id and ArticleID are both indexes on ArticleAuthor, so I'm not sure what I'm doing wrong. I can't do much with IX_Name being the clustered index...my boss created this table and said to do that.

DECLARE @DOI VARCHAR(72) = '10.1140/EPJC/S10052-012-1993-2'

DECLARE @ArticleID VARCHAR(12) 

SELECT 
    @ArticleID = A.Id
FROM
    Article A 
LEFT JOIN 
    JournalName JN WITH (NOLOCK) ON JN.Id = A.JournalId 
WHERE 
    A.DOI = @DOI

PRINT 'GOT ARTICLE DATA ' + format(getdate(), 'yyyy-MM-dd HH:mm:ss.fff')

SELECT 
    AA.Id 
FROM 
    [ArticleWarehouseTemp]..ArticleAuthor AA WITH (NOLOCK)
WHERE 
    AA.ArticleID = @ArticleID 

PRINT 'GOT ARTICLEAUTHOR DATA ' + format(getdate(), 'yyyy-MM-dd HH:mm:ss.fff')

Please help! This is driving me insane. I've attached the table structure and indexes here too.

CREATE TABLE [dbo].[ArticleAuthor]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[ArticleId] [int] NOT NULL,
[FullName] [nvarchar](128) NULL,
[LastName] [nvarchar](64) NULL,
[FirstName] [nvarchar](64) NULL,
[FirstInitial] [nvarchar](1) NULL,
[OrcId] [varchar](36) NULL,
[IsSequenceFirst] [bit] NULL,
[SequenceIndex] [smallint] NULL,
[CreatedDate] [smalldatetime] NULL CONSTRAINT [DF_ArticleAuthor_CreatedDate]  DEFAULT (getdate()),
[UpdatedDate] [smalldatetime] NULL,
[Affiliations] [varbinary](max) NULL
) ON [ArticleAuthorFileGroup] TEXTIMAGE_ON [ArticleAuthorFileGroup]
GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[ArticleAuthor] WITH CHECK 
   ADD CONSTRAINT [FK_ArticleId] 
   FOREIGN KEY([ArticleId]) REFERENCES [dbo].[Article] ([Id])
GO

ALTER TABLE [dbo].[ArticleAuthor] CHECK CONSTRAINT [FK_ArticleId]
GO

CREATE NONCLUSTERED INDEX [IX_ID] 
ON [dbo].[ArticleAuthor] ([Id] ASC)

CREATE NONCLUSTERED INDEX [IX_ArticleID] 
ON [dbo].[ArticleAuthor] ([ArticleId] ASC)

CREATE CLUSTERED INDEX [IX_Name] 
ON [dbo].[ArticleAuthor] ([LastName] ASC, [FirstName] ASC, [FirstInitial] ASC)

Solution

  • If you have to keep the current clustered index as is, you can do the following:

    1.

    Make sure that you are using correct types:

    DECLARE @ArticleID VARCHAR(12) 
    

    should be

    DECLARE @ArticleID int;
    

    to match the type of the column ArticleId in the ArticleAuthor table.

    2.

    To make sure that index IX_ArticleID is used efficiently, to make it a covering index, INCLUDE the column Id to it:

    CREATE NONCLUSTERED INDEX [IX_ArticleID] 
    ON [dbo].[ArticleAuthor] ([ArticleId] ASC)
    INCLUDE(Id);
    

    3.

    If you have a very skewed distribution of data, i.e. a number of rows per ArticleId varies greatly for different articles. Say, if one article has 2 rows and another article has million rows, then you'd better add OPTION(RECOMPILE) to the query and make sure that statistics and/or index(es) are kept up to date.