I am working on a system (with Laravel) where users can fill a few filters to get the data they need. Data is not prepared real time, once the filters are set, a job is pushed to the queue and once the query finishes a CSV file is created. Then the user receives an email with the file which was created so that they can download it.
I have seen some errors in the jobs where it took longer than 30 mins to process one job and when I checked I have seen some users created filter with more than 600 values.
This filter values are translated like this:
SELECT filed1,
field2,
field6
FROM table
INNER JOIN table2
ON table.id = table2.cid
/* this is how we try not to give same data to the users again so we used NOT IN */
WHERE table.id NOT IN(SELECT data_id
FROM data_access
WHERE data_user = 26)
AND ( /* this bit is auto populated with the filter values */
table2.filed_a = 'text a'
OR table2.filed_a = 'text b'
OR table2.filed_a = 'text c' )
Well I was not expecting users to go wild and fine tune with a huge filter set. It is okay for them to do this but need a solution to make this query quicker.
One way is to create a temp table on the fly with the filter values and covert the query for INNER JOIN
but not sure if it would increase the performance.
Also, given that in a normal day system would need to create at least 40-ish temp tables and delete them afterwards. Would this become another issue in the long run?
I would love to hear any other suggestions that may help me solve this issue other then temp table method.
I would suggest writing the query like this:
SELECT ?.filed1, ?.field2, ?.field6 -- qualify column names (but no effect on performance)
FROM table t JOIN
table2 t2
ON t.id = t2.cid
WHERE NOT EXISTS (SELECT 1
FROM data_access da
WHERE t.id = da.data_id AND da.data_user = 26
) AND
t2.filed_a IN ('text a', 'text b', 'text c') ;
Then I would recommend indexes. Most likely:
table2(filed_a, cid)
table1(id)
(may not be necessary if id
is already the primary key)data_access(data_id, data_user)
You can test this as your own query. I don't know how to get Laravel to produce this (assuming it meets your performance objectives).