I have a simple database of "things" that can have zero or more "categories" or "tags" . I've written a stored procedure that will get the first N objects in a given category, and performance is very good. It looks something like
SELECT * FROM things
WHERE things.datestamp > @start AND things.datestamp < @end
AND EXISTS (
SELECT 1 from thing_tags
WHERE things.id = thing_tags.thing_id
AND thing_tags.tag = @searchTag
)
LIMIT ?
With a few hundred thousand "things", each having about 0-5 tags, performance is fine -- I can get the first couple hundred matches in tens of milliseconds at most.
However, if I want to know how many total matches there are, it takes ages -- many seconds, at least. Is there a smarter way than just SELECT COUNT(id) FROM .... (rest of query above)
? The id
field is indexed, per this suggestion, but the index doesn't help much since it has to inspect the tags
table for each row in things
.
I'm looking at implementing pagination, and I know LIMIT ?,?
(or LIMIT ? OFFSET ?
) would make it easy, but it would be nice to show the user an approximation at least of how many total "matches" to expect.
If it helps anybody with a similar problem, I wound up giving up -- I do a second query with a bigger (but still reasonable) limit, then render the result as "1-10 of 100+" (or whatever the bigger limit was). This was good enough for my needs.
The short answer is that there is no good way to get a "pretty close" estimate for this kind of query, in this kind of database, without manually maintaining a separate count value somewhere else.