Search code examples
postgresqlquery-optimizationdatabase-administration

Optimizing a query with multiple IN


I have a query like this:

SELECT * FROM table
WHERE department='param1' AND type='param2' AND product='param3'
AND product_code IN (10-30 alphanumerics) AND unit_code IN (10+ numerics)
AND first_name || last_name IN (10-20 names)
AND sale_id LIKE ANY(list of regex string)

Runtime was too high so I was asked to optimize it.

The list of parameters varies for the code columns for different users. Each user provides their list of codes and then loops over product. product used to be an IN clause list as well but it was split up.

Things I tried

By adding an index on (department, type and product) I was able to get a 4x improvement. Current runtime is that some values of product only take 2-3 seconds, while others take 30s.

Tried creating a pre-concat'd column of first_name || last_name, but the runtime improvement was too small to be worth it.

Is there some way I can improve the performance of the other clauses, such as the "IN" clauses or the LIKE ANY clause?


Solution

  • In my experience replacing large IN lists, with a JOIN to a VALUES clause often improves performance.

    So instead of:

    SELECT * 
    FROM table
    WHERE department='param1' 
      AND type='param2' 
      AND product='param3'
      AND product_code IN (10-30 alphanumerics)
    

    Use:

    SELECT * 
    FROM table t
       JOIN ( values (1),(2),(3) ) as x(code) on x.code = t.product_code
    WHERE department='param1' 
      AND type='param2' 
      AND product='param3'
    

    But you have to make sure you don't have any duplicates in the values () list


    The concatenation is also wrong because the concatenated value is something different then comparing each value individually, e.g. ('alexander', 'son') would be treated identical to ('alex', 'anderson')`

    You should use:

    and (first_name, last_name) in ( ('fname1', 'lname1'), ('fname2', 'lname2'))
    

    This can also be written as a join

    SELECT * 
    FROM table t
      JOIN ( values (1),(2),(3) ) as x(code) on x.code = t.product_code
      JOIN ( 
         values ('fname1', 'lname1'), ('fname2', 'lname2') 
      ) as n(fname, lname) on (n.fname, n.lname) = (t.first_name, t.last_name)
    WHERE department='param1' 
      AND type='param2' 
      AND product='param3'