Search code examples
mysqlsqlmariadbinnodb

Problems with ORDER BY RAND() and big tables


Hello I asked a question this morning, and I realized that the problem was not where I was looking (here the original question)

I have this query to randomly generate registries from an address book.

SELECT * FROM address_book ab 
            WHERE 
            ab.source = "PB" AND 
            ab.city_id = :city_id AND 
            pb_campaign_id = :pb_campaign_id AND 
            ab.id NOT IN (SELECT address_book_id FROM calls WHERE calls.address_book_id = ab.id AND calls.status_id IN ("C","NO") OR (calls.status_id IN ("NR","OC") AND TIMESTAMPDIFF(MINUTE,calls.updated_at,NOW()) < 30))
            ORDER BY RAND()
            LIMIT 1';

but I noticed that "order by rand ()" take more than 50s and use up to 25-50% CPU with large tables (100k +) so i looked for solutions here but i didn't find anything that worked. note: ids are not self-incrementing, there may be gaps

Any idea?


Solution

  • I would recommend writing this as:

    SELECT *
    FROM address_book ab 
    WHERE ab.source = 'PB' AND 
          ab.city_id = :city_id AND 
          pb_campaign_id = :pb_campaign_id AND 
          NOT EXISTS (SELECT 1
                      FROM calls c
                      WHERE c.address_book_id = ab.id AND
                            ( c.status_id IN ('C', 'NO') OR
                             (c.status_id IN ('NR', 'OC') AND c.updated < now() - interval 30 minute)
                            ) 
                    )
    
    ORDER BY RAND()
    LIMIT 1;
    

    Note that this changes the logic in the correlated subquery so c.address_book_id = ab.id always applies. I suspect that is the issue with performance.

    Then, create indexes on:

    • address_book(source, city_id, campaign_id, id)
    • calls(address_book_id, status_id, updated)

    I am guessing that this will be sufficient to improve performance. If there happen to be a zillion rows that match the conditions, then the order by rand() might be an issue.