Search code examples
mysqlsqlquery-optimization

Best way to optimise a simple MYSQL COUNT query


I have a table called signups and all I want is the COUNT of all signups where is_email_verified is true.

Attempt 1 results in an index scan:

SELECT COUNT(*) FROM signups WHERE is_email_verified = true

Attemp 2 results in an index scan:

SELECT SUM(is_email_verified) from signups

Ideally I could just create an index on SUM(is_email_verified) but that functionality doesn't exist.

I could create a trigger and a summary table that auto updates when an email is verified but I feel like there should be an easier way that I am not seeing.

Any ideas?


Solution

  • COUNT(*) is notorious for needing to scan through the rows it's counting. Why? It's a long story involving ACID and the integrity of concurrent updates.

    You mentioned an index scan. I guess that means you have an index on is_email_verified. Your query, as written, will scan that index looking for the entries where that column is 1. It will not scan the part of the index where the value is zero. That's how it works. An sequential index scan here is probably more efficient than a table scan (less data to wrangle) but a scan is still necessary.

    Your second query, SUM(is_email_verified) will have to scan the entire index. That means it will be a bit slower.

    If the performance of this query is truly causing problems for your app, there are several strategies you can apply. You mentioned a series of triggers. You'll need them for INSERT, UPDATE, and DELETE operations on the table. That could work.

    You could also arrange for some kind of application-side memoization cache. That's a fancy way of saying "stash that number of verified users someplace and only update it once in a while".

    Still, my (opinionated) advice is "don't worry about it now". Get your app working with straightforward easy-to-read queries like the one you showed us, and get yourself a growing base of happy users. You will have to do optimization work as your app gains users, and experience shows that it's really hard to guess exactly what you'll have to optimize.