Search code examples
mysqldatabase-performancequery-performance

Select count vs select statement performance when the query will return 0 rows


I'm looking through some code an old developer wrote, who is no longer around. I found something that I'm not quite sure makes sense to be doing, but I don't think I have enough knowledge or good enough google fu to find the answer, so here goes:

The code looks like this:

var queryCount = SELECT COUNT(Col1) FROM MyTable WHERE ColumnInIndex = 'SomeValue'

if(queryCount == 0)
   return [];

var results = SELECT Col1, Col2, Col3,... FROM MyTable WHERE ColumnInIndex = 'SomeValue'
return results;

So my first thought was that seems redundant and should just use the results from the select query in the first place. I dug a little farther because the commit suggests its for perf tuning.

From what I understand from googling is that count in fact would be faster because it doesn't have to return any data or pull any rows & can rely solely on the index.

However, in the pseudocode I've written above, the case in which the where clause wouldn't return any rows it seems like we wouldn't get a performance boost because neither count nor select would return any results. Plus, when there are actually results we're doing two of these queries instead of just the one.

Additionally, it seems like since we're using the rows in the first place, we should just get the data in one query vs making two trips to the db.

Am I completely off base here?

We're using MySql v5.6.46


Solution

  • You are correct. [But read all the way to the end!]

    Two steps is inefficient -- in any version of MySQL. If there are zero rows, the Optimizer will do essentially the same amount of work for either SELECT.

    If there are some rows, then the first SELECT is a waste of time. See the programming acronym 'KISS'.

    The pseudo-code implies that the 'return' is an empty array; I am assuming that is the case for either query when there is no matching row?

    A side note: I hope ColumnInIndex means "a column that is first in some index". If the column is not first, neither query will use the index. (This comment does not affect the main Question.)

    Another side note (aimed at PTank): When there are no matching rows, neither of these

    SELECT col FROM ... WHERE ...
    SELECT  *  FROM ... WHERE ...
    

    return any rows; they do not return NULL for column(s). (The use of * is 'bad' for multiple reasons unrelated to the original Question.)

    Oh, yet another comment. COUNT(col) checks col for being not-NULL before counting the row. COUNT(*) simply counts the rows. In almost all cases, you should use COUNT(*); it's simpler, faster, and probably gives the same answer. (The answer will be the same if col is the PRIMARY KEY.)

    Good grief!! That means that your first query is not redundant! The odd-ball case is when col1 is NULL in every row. The first query will return a count of zero, but the second query will return some rows (with the first column NULL).

    Was this a trick question?