Search code examples
mysqldistinct-values

Which is faster: SELECT DISTINCT or WHERE foo != 0?


id | foo | bar
--------------
0  | 0   | ...
1  | 1   | ...
2  | 2   | ...
3  | 0   | ...
4  | 2   | ...

I need all unique foo values, but not "0" which is in very often.

Which is faster?

SELECT foo FROM `table` WHERE foo != 0

or

SELECT DISTINCT foo FROM `table`

The last would keep the 0 but be removed in PHP.

On my server both were fast enough but one of these two option might be theoretically faster :)


Solution

  • Here's an indexed data set of 130,000 rows. The sparse column has values in the range 0-100000. The dense column has values in the range 0-100.

    SELECT * FROM my_table;
    +----+--------+-------+
    | id | sparse | dense |
    +----+--------+-------+
    |  1 |      0 |     0 |
    |  2 |  52863 |    87 |
    |  3 |  76503 |    21 |
    |  4 |  77783 |    25 |
    |  6 |  89359 |    73 |
    |  7 |  97772 |    69 |
    |  8 |  53429 |    59 |
    |  9 |  35206 |    99 |
    | 13 |  88062 |    44 |
    | 14 |  56312 |    49 |
    ...
    
    SELECT * FROM my_table WHERE sparse <> 0;
    130941 rows in set (0.09 sec)
    
    SELECT * FROM my_table WHERE dense <> 0;
    130289 rows in set (0.09 sec)
    
    SELECT DISTINCT sparse FROM my_table;
    72844 rows in set (0.27 sec)
    
    SELECT DISTINCT dense FROM my_table;
    101 rows in set (0.00 sec)
    

    As you can see, whether or not DISTINCT is faster depends very much on the density of the data.

    Obviously, in this instance, the two queries are very different from each other!