I have this query below, and it's extremely slow. It takes almost 2 minutes for run to return 3,008 records out of a table with 99 million records. The first query where it gets "Article" data is super fast, less than 1 second and always returns 1 record. It's the second query that's the problem. I don't really want to JOIN these queries. The first one is so quick, and (in my real query) I'm setting more than just @ArticleID for further use.
The query execution plan says it has 75% for it on a clustered key lookup on IX_Name, which didn't make sense to me because I'm not even doing anything with name fields here. Furthermore, Id and ArticleID are both indexes on ArticleAuthor, so I'm not sure what I'm doing wrong. I can't do much with IX_Name being the clustered index...my boss created this table and said to do that.
DECLARE @DOI VARCHAR(72) = '10.1140/EPJC/S10052-012-1993-2'
DECLARE @ArticleID VARCHAR(12)
SELECT
@ArticleID = A.Id
FROM
Article A
LEFT JOIN
JournalName JN WITH (NOLOCK) ON JN.Id = A.JournalId
WHERE
A.DOI = @DOI
PRINT 'GOT ARTICLE DATA ' + format(getdate(), 'yyyy-MM-dd HH:mm:ss.fff')
SELECT
AA.Id
FROM
[ArticleWarehouseTemp]..ArticleAuthor AA WITH (NOLOCK)
WHERE
AA.ArticleID = @ArticleID
PRINT 'GOT ARTICLEAUTHOR DATA ' + format(getdate(), 'yyyy-MM-dd HH:mm:ss.fff')
Please help! This is driving me insane. I've attached the table structure and indexes here too.
CREATE TABLE [dbo].[ArticleAuthor]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[ArticleId] [int] NOT NULL,
[FullName] [nvarchar](128) NULL,
[LastName] [nvarchar](64) NULL,
[FirstName] [nvarchar](64) NULL,
[FirstInitial] [nvarchar](1) NULL,
[OrcId] [varchar](36) NULL,
[IsSequenceFirst] [bit] NULL,
[SequenceIndex] [smallint] NULL,
[CreatedDate] [smalldatetime] NULL CONSTRAINT [DF_ArticleAuthor_CreatedDate] DEFAULT (getdate()),
[UpdatedDate] [smalldatetime] NULL,
[Affiliations] [varbinary](max) NULL
) ON [ArticleAuthorFileGroup] TEXTIMAGE_ON [ArticleAuthorFileGroup]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[ArticleAuthor] WITH CHECK
ADD CONSTRAINT [FK_ArticleId]
FOREIGN KEY([ArticleId]) REFERENCES [dbo].[Article] ([Id])
GO
ALTER TABLE [dbo].[ArticleAuthor] CHECK CONSTRAINT [FK_ArticleId]
GO
CREATE NONCLUSTERED INDEX [IX_ID]
ON [dbo].[ArticleAuthor] ([Id] ASC)
CREATE NONCLUSTERED INDEX [IX_ArticleID]
ON [dbo].[ArticleAuthor] ([ArticleId] ASC)
CREATE CLUSTERED INDEX [IX_Name]
ON [dbo].[ArticleAuthor] ([LastName] ASC, [FirstName] ASC, [FirstInitial] ASC)
If you have to keep the current clustered index as is, you can do the following:
1.
Make sure that you are using correct types:
DECLARE @ArticleID VARCHAR(12)
should be
DECLARE @ArticleID int;
to match the type of the column ArticleId
in the ArticleAuthor
table.
2.
To make sure that index IX_ArticleID
is used efficiently, to make it a covering index, INCLUDE
the column Id
to it:
CREATE NONCLUSTERED INDEX [IX_ArticleID]
ON [dbo].[ArticleAuthor] ([ArticleId] ASC)
INCLUDE(Id);
3.
If you have a very skewed distribution of data, i.e. a number of rows per ArticleId
varies greatly for different articles. Say, if one article has 2 rows and another article has million rows, then you'd better add OPTION(RECOMPILE)
to the query and make sure that statistics and/or index(es) are kept up to date.