Search code examples
mysqlinnodbquery-performance

Estimating number of results from a MySQL "SELECT WHERE EXISTS" query?


I have a simple database of "things" that can have zero or more "categories" or "tags" . I've written a stored procedure that will get the first N objects in a given category, and performance is very good. It looks something like

SELECT * FROM things
WHERE things.datestamp > @start AND things.datestamp < @end
  AND EXISTS (
    SELECT 1 from thing_tags
    WHERE things.id = thing_tags.thing_id
      AND thing_tags.tag = @searchTag
  )
LIMIT ?

With a few hundred thousand "things", each having about 0-5 tags, performance is fine -- I can get the first couple hundred matches in tens of milliseconds at most.

However, if I want to know how many total matches there are, it takes ages -- many seconds, at least. Is there a smarter way than just SELECT COUNT(id) FROM .... (rest of query above)? The id field is indexed, per this suggestion, but the index doesn't help much since it has to inspect the tags table for each row in things.

I'm looking at implementing pagination, and I know LIMIT ?,? (or LIMIT ? OFFSET ?) would make it easy, but it would be nice to show the user an approximation at least of how many total "matches" to expect.


Solution

  • If it helps anybody with a similar problem, I wound up giving up -- I do a second query with a bigger (but still reasonable) limit, then render the result as "1-10 of 100+" (or whatever the bigger limit was). This was good enough for my needs.

    The short answer is that there is no good way to get a "pretty close" estimate for this kind of query, in this kind of database, without manually maintaining a separate count value somewhere else.