This is a long, involved question about index optimization theory. This is not homework, though I was first exposed to this question in a sample exam for Microsoft's 70-432. The original question was about general query optimization, but then I found this peculiar behavior I could not explain.
First, the table:
CREATE TABLE Invoice_details (
Invoice_id int NOT NULL,
Customer_id int NOT NULL,
Invoice_date datetime DEFAULT GETDATE() NULL,
Amount_total int NULL,
Serial_num int IDENTITY (1,1) NOT NULL)
Now, a clustered index, and the two indexes for testing:
CREATE UNIQUE CLUSTERED INDEX [ix_serial] ON [dbo].[Invoice_details] ([Serial_num] ASC)
/* Below is the "original" index */
CREATE NONCLUSTERED INDEX [ix_invoice_customer] ON [dbo].[Invoice_details]
([Invoice_id] ASC,[Customer_id] ASC)
/* Below is the "optimized" index (adds one included field) */
CREATE NONCLUSTERED INDEX [ix_invoice_customer_inc] ON [dbo].[Invoice_details]
([Invoice_id] ASC,[Customer_id] ASC) INCLUDE ([Invoice_date])
I also added some random test data to the table - 100000 rows. Invoice_id, Customer_id, and Amount_total each received their own random values (range 1000-9999), and Invoice_date received GETDATE() plus a random number of seconds (range 1000-9999). I can provide the actual routine I used, but did not think the specifics would be relevant.
And finally, the query:
SELECT Invoice_id,Customer_id,Invoice_date FROM Invoice_details WHERE Customer_id=1234;
Obviously, the query's first step will be a nonclustered index scan. Regardless of which index is used, that first step will return the same number of index rows. With the "original" index, the next step will be a lookup via the clustered index to retrieve Invoice_date, followed by an internal JOIN between the two sets. With the "optimized" index, that field is included in the index leaf, so the planner goes straight to returning the results.
Which index results in faster execution, and why?
It depends ... on the tipping point.