Search code examples
mysqlsqlquery-performance

Optimising a SQL query with a huge where clause


I am working on a system (with Laravel) where users can fill a few filters to get the data they need. Data is not prepared real time, once the filters are set, a job is pushed to the queue and once the query finishes a CSV file is created. Then the user receives an email with the file which was created so that they can download it.

I have seen some errors in the jobs where it took longer than 30 mins to process one job and when I checked I have seen some users created filter with more than 600 values.

This filter values are translated like this:

SELECT filed1, 
       field2, 
       field6 
FROM   table 
       INNER JOIN table2 
               ON table.id = table2.cid 
/* this is how we try not to give same data to the users again so we used NOT IN */ 
WHERE  table.id NOT IN(SELECT data_id 
                       FROM   data_access 
                       WHERE  data_user = 26) 
       AND ( /* this bit is auto populated with the filter values */ 
           table2.filed_a = 'text a' 
            OR table2.filed_a = 'text b' 
            OR table2.filed_a = 'text c' ) 

Well I was not expecting users to go wild and fine tune with a huge filter set. It is okay for them to do this but need a solution to make this query quicker.

One way is to create a temp table on the fly with the filter values and covert the query for INNER JOIN but not sure if it would increase the performance. Also, given that in a normal day system would need to create at least 40-ish temp tables and delete them afterwards. Would this become another issue in the long run?

I would love to hear any other suggestions that may help me solve this issue other then temp table method.


Solution

  • I would suggest writing the query like this:

    SELECT ?.filed1, ?.field2, ?.field6  -- qualify column names (but no effect on performance)
    FROM table t JOIN
         table2 t2
         ON t.id = t2.cid 
    
    WHERE NOT EXISTS (SELECT 1
                      FROM data_access da
                      WHERE t.id = da.data_id AND da.data_user = 26
                     ) AND
         t2.filed_a IN ('text a', 'text b', 'text c') ;
    

    Then I would recommend indexes. Most likely:

    • table2(filed_a, cid)
    • table1(id) (may not be necessary if id is already the primary key)
    • data_access(data_id, data_user)

    You can test this as your own query. I don't know how to get Laravel to produce this (assuming it meets your performance objectives).