I have a table that stores photo information with id as Primary key:
id(PK), title, album_id, posted_by, published, filename, tags, ratings, date_posted
This table will hold infor of 100+ Million photos and I need to run this query like these frequently:
1) get all photos (just id,filename,title columns) of a given album
select id, filename, title from photos where album_id = @AlbumId and published = 1
2) get all published photos of a given user but exclude photos of currently viewing album
select id, filename, title from photos where posted_by='bob' and album_id <>10 and published = 1
I want to avoid index and table scanning. I need to use seek(say 100%) as much as possible.
Can this be done? What type of index and on which columns would help me achieve this?
Thanks
In reality, you'll only be able to find this out yourself by measuring performance before you tweak, then tweak, and measure again and again.
But based on your query, you should consider (or at least try this first) a non-clustered index like this:
CREATE NONCLUSTERED INDEX IX01_Photos
ON dbo.Photos(album_id, published, posted_by)
INCLUDE(id, filename, title)
Reasoning:
album_id
and published
- so use these two columns first in your indexposted_by
in the WHERE clause - put that into that same index as the third columnid, filename, title
columns in the indexWith all these things in place, you should be seeing mostly index seeks on that new non-clustered index to satisfy your queries. But again: lots of other factors also come into play which you probably haven't mentioned in your question and possibly not even thought about yourself - but this approach should give you a good starting point no less.