Search code examples
postgresqlwhere-clausewhere-in

Potsgres Performance: WHERE = Versus WHERE IN (...) for single values


I am working on an application that uses Postgres as its backing database. I am working on a component that executes a SELECT query on the DB, using a multi-select input control on the UI to supply the input values for the query. Previously, this input control was a traditional select control, so only one option could be specified at a time. That meant the SQL query would look something like this:

SELECT * FROM items WHERE code = 'value1';

After the multi-select has been implemented, the SQL query would look something like this:

SELECT * FROM items WHERE code IN ('value1', 'value2', 'value3');

However, I have a question about when the user only specifies a single value in the multi-select. That would mean that only one value is specified within the parentheses:

SELECT * FROM items WHERE code IN ('value1');

I recognize that this query is semantically the same as the query that used WHERE ... = .... My question is whether there is a significant performance difference between the two. I have considered adding application logic that chooses whether to replace the WHERE ... IN (...) statement with its WHERE ... = ... counterpart if only one value is specified? Is this optimization necessary? In fact, if the performance is indeed the same in both cases, then why bother ever using WHERE ... = ... when WHERE ... IN (...) is more flexible? Am I worrying too much about a micro-optimization?

Thank you.


Solution

  • TLDR: It does not make a difference, the performance will always be the same, therefore the IN is a clear winner for your use case.

    A quick test to verify this could be done:

    test=# CREATE table test_in (id serial primary key);
    CREATE TABLE
    test=# explain select * from test_in where id = '1';
                                       QUERY PLAN
    ---------------------------------------------------------------------------------
     Index Only Scan using test_in_pkey on test_in  (cost=0.15..2.17 rows=1 width=4)
       Index Cond: (id = 1)
    (2 rows)
    
    test=# explain select * from test_in where id in ('1');
                                       QUERY PLAN
    ---------------------------------------------------------------------------------
     Index Only Scan using test_in_pkey on test_in  (cost=0.15..2.17 rows=1 width=4)
       Index Cond: (id = 1)
    (2 rows)
    

    Notice that the plan is the same, so is the index condition, this ensures that the same cost is not just accidental. During the query rewriting stage of the execution PostgreSQL detects some simple cases of equivalency and rewrites them in a canonical form.

    To see another example of query that is still semantically equivalent but is not rewritten to the exact same plan you could try:

    test=# explain select * from test_in where id in ('1', '1');
                                       QUERY PLAN
    ---------------------------------------------------------------------------------
     Index Only Scan using test_in_pkey on test_in  (cost=0.15..3.34 rows=2 width=4)
       Index Cond: (id = ANY ('{1,1}'::integer[]))
    (2 rows)