I'm creating a filtered index such that the WHERE filter includes the complete query criteria. WIth such an index, it seems that a key column would be unnecessary, though SQL requires me to add one. For example, consider the table:
CREATE TABLE Invoice
(
Id INT NOT NULL IDENTITY PRIMARY KEY,
Data VARCHAR(MAX) NOT NULL,
IsProcessed BIT NOT NULL DEFAULT 0,
IsInvalidated BIT NOT NULL DEFAULT 0
)
Queries on the table look for new invoices to process, i.e.:
SELECT *
FROM Invoice
WHERE IsProcessed = 0 AND IsInvalidated = 0
So, I can tune for these queries with a filtered index:
CREATE INDEX IX_Invoice_IsProcessed_IsInvalidated
ON Invoice (IsProcessed)
WHERE (IsProcessed = 0 AND IsInvalidated = 0)
GO
My question: What should the key column(s) for IX_Invoice_IsProcessed_IsInvalidated
be? Presumably the key column isn't being used. My intuition leads me to pick a column that is small and will keep the index structure relatively flat. Should I pick the table primary key (Id
)? One of the filter columns, or both of them?
Because you have a clustered index on that table it doesn't really matter what you put in the key columns of that index; meaning Id
is there free of charge. The only thing you can do is include
everything in the included section of the index to actually have data handy at the leaf level of the index to exclude key lookups to the table. Or, if the queue is huge, then, perhaps, some other column would be useful in the key section.
Now, if that table didn't have a primary key then you would have to include
or specify as key columns all the columns that you need for joining or other purposes. Otherwise, RID lookups on heap would occur because on the leaf level of indexes you would have references to data pages.