Hello I asked a question this morning, and I realized that the problem was not where I was looking (here the original question)
I have this query to randomly generate registries from an address book.
SELECT * FROM address_book ab
WHERE
ab.source = "PB" AND
ab.city_id = :city_id AND
pb_campaign_id = :pb_campaign_id AND
ab.id NOT IN (SELECT address_book_id FROM calls WHERE calls.address_book_id = ab.id AND calls.status_id IN ("C","NO") OR (calls.status_id IN ("NR","OC") AND TIMESTAMPDIFF(MINUTE,calls.updated_at,NOW()) < 30))
ORDER BY RAND()
LIMIT 1';
but I noticed that "order by rand ()" take more than 50s and use up to 25-50% CPU with large tables (100k +) so i looked for solutions here but i didn't find anything that worked. note: ids are not self-incrementing, there may be gaps
Any idea?
I would recommend writing this as:
SELECT *
FROM address_book ab
WHERE ab.source = 'PB' AND
ab.city_id = :city_id AND
pb_campaign_id = :pb_campaign_id AND
NOT EXISTS (SELECT 1
FROM calls c
WHERE c.address_book_id = ab.id AND
( c.status_id IN ('C', 'NO') OR
(c.status_id IN ('NR', 'OC') AND c.updated < now() - interval 30 minute)
)
)
ORDER BY RAND()
LIMIT 1;
Note that this changes the logic in the correlated subquery so c.address_book_id = ab.id
always applies. I suspect that is the issue with performance.
Then, create indexes on:
address_book(source, city_id, campaign_id, id)
calls(address_book_id, status_id, updated)
I am guessing that this will be sufficient to improve performance. If there happen to be a zillion rows that match the conditions, then the order by rand()
might be an issue.