Search code examples
mysqlquery-optimization

mysql query takes 3 hours to run and process


I have a query that is ran on a cron job late at night. This query is then processed through a generator as it has to populate another database and I make some additional processes and checks before it is sent to the other DB.

I am wondering is there anyway for me to speed up this query and hopefully keep it as a single query. Or will I be forced to create other queries and join the data within PHP? This queries the main mautic database.

SELECT  c.id as "campaign_id",
        c.created_by_user,
        c.name,
        c.date_added,
        c.date_modified,
        (SELECT DISTINCT COUNT(cl.lead_id)) as number_of_leads,
        GROUP_CONCAT(lt.tag) as tags,
        cat.title as category_name,
        GROUP_CONCAT(ll.name) as segment_name,
        GROUP_CONCAT(emails.name) as email_name,
        CASE WHEN c.is_published = 1 THEN "Yes" ELSE "No" END AS "published",
        CASE WHEN c.publish_down > now() THEN "Yes" 
             WHEN c.publish_down > now() AND c.is_published = 0 THEN "Yes" 
             ELSE "No" END AS "expired"
FROM campaigns c 
    LEFT JOIN campaign_leads cl ON cl.campaign_id = c.id
    LEFT JOIN lead_tags_xref ltx on cl.lead_id = ltx.lead_id 
    LEFT JOIN lead_tags lt on ltx.tag_id = lt.id 
    LEFT JOIN categories cat on c.category_id = cat.id 
    LEFT JOIN lead_lists_leads llist on cl.lead_id = llist.lead_id 
    LEFT JOIN lead_lists ll on llist.leadlist_id = ll.id 
    LEFT JOIN email_list_xref el on ll.id = el.leadlist_id 
    LEFT JOIN emails on el.email_id = emails.id 
GROUP BY c.id;

Here is a image of the explain https://prnt.sc/qQtUaLK3FIpQ

Definitions Campaign Table: https://prnt.sc/6JXRGyMsWpcd

Campaign_leads table https://prnt.sc/pOq0_SxW2spe

lead_tags_xref table https://prnt.sc/oKYn92O82gHL

lead_tags table https://prnt.sc/ImH81ECF6Ly1

categories table https://prnt.sc/azQj_Xwq3dw9

lead_lists_lead table https://prnt.sc/x5C5fiBFP2N7

lead_lists table https://prnt.sc/bltkM0f3XeaH

email_list_xref table https://prnt.sc/kXABVJSYWEUI

emails table https://prnt.sc/7fZcBir1a6QT

I am only expected 871 rows to be completed, I have identified that the joins can be very large, in the tens of thousands.


Solution

  • Seems you have an useless select DISTINCT .. could you are looking for a count(distinct .. )
    In this way you can avoid nested select for each rows in main select ..

    SELECT  c.id as "campaign_id",
            c.created_by_user,
            c.name,
            c.date_added,
            c.date_modified,
            COUNT(DISTINCT cl.lead_id) as number_of_leads,
            GROUP_CONCAT(lt.tag) as tags,
            cat.title as category_name,
            GROUP_CONCAT(ll.name) as segment_name,
            GROUP_CONCAT(emails.name) as email_name,
            CASE WHEN c.is_published = 1 THEN "Yes" ELSE "No" END AS "published",
            CASE WHEN c.publish_down > now() THEN "Yes" 
                 WHEN c.publish_down > now() AND c.is_published = 0 THEN "Yes" 
                 ELSE "No" END AS "expired"
    FROM campaigns c 
        LEFT JOIN campaign_leads cl ON cl.campaign_id = c.id
        LEFT JOIN lead_tags_xref ltx on cl.lead_id = ltx.lead_id 
        LEFT JOIN lead_tags lt on ltx.tag_id = lt.id 
        LEFT JOIN categories cat on c.category_id = cat.id 
        LEFT JOIN lead_lists_leads llist on cl.lead_id = llist.lead_id 
        LEFT JOIN lead_lists ll on llist.leadlist_id = ll.id 
        LEFT JOIN email_list_xref el on ll.id = el.leadlist_id 
        LEFT JOIN emails on el.email_id = emails.id 
    GROUP BY c.id;
    

    anyway be sure you have a proper composite index on

    table campaign_leads columns campaign_id, lead_id
    table lead_tags_xref columns lead_id, tag_id
    table lead_lists_leads columns lead_id, leadlist_id
    table email_list_xref columns leadlist_id, email_id