Search code examples
mysqlsqldatabasesubquerymatomo

sql query with subqueries optimization


I'm a beginner in SQL and I'm having an issue with my subqueries below. The reason I set up the subquery is I need to order and group a column differently than how data is sorted on the main table. Are there any other ways to optimize this query? It has been timing out and has been running for more than 30 minutes. I'd like to avoid rescanning the piwik_log_visit table as much as possible. Is there a way to Order By or Group By specific columns only? I appreciate any help. Thank you!

Set @theDate = cast('2015-11-26 08:00:00' as datetime);

SELECT  t2.idorder AS 'Order ID',
        (
            SELECT  COALESCE(NULLIF(t3.referer_name,''), 'Direct')
            FROM    piwik_log_visit t3
            WHERE   conv(hex(t3.idvisitor), 16, 10) = conv(hex(t2.idvisitor), 16, 10) 
            AND     t3.visit_first_action_time >= (@theDate - INTERVAL 32 DAY)
            ORDER BY t3.visit_last_action_time DESC
            limit 1
        ) AS 'Referrer (Last)',
        (
            SELECT  COALESCE(NULLIF(t4.referer_name,''), 'Direct')
            FROM    piwik_log_visit t4
            WHERE   inet_ntoa(conv(hex(t4.location_ip), 16, 10)) = inet_ntoa(conv(hex(t1.location_ip), 16, 10)) 
            AND     t4.visit_first_action_time >= (@theDate - INTERVAL 32 DAY)
            GROUP BY inet_ntoa(conv(hex(t4.location_ip), 16, 10))
            ORDER BY t4.visit_first_action_time
            limit 1
        ) AS 'Referrer (IP:First)',
        t1.referer_url AS 'Referrer URL'
FROM    piwik_log_visit t1, 
        piwik_log_conversion t2
WHERE   conv(hex(t1.idvisitor), 16, 10) = conv(hex(t2.idvisitor), 16, 10) 
AND     t2.idorder IS NOT NULL 
AND     t2.server_time BETWEEN '2015-11-25 07:59:59' AND '2015-11-26 08:00:00' 
AND     t1.visit_first_action_time >= (@theDate - INTERVAL 32 DAY)
GROUP BY t2.idorder

Explain plan:

http://screencast.com/t/3loUAUkTe


Solution

  • The main reason for the poor performance is because of the join conditions between the tables. They are not optimizable with index due to data conversion on the columns involved. Without index, the joins become Cartesian joins that are then filtered by evaluating the join expression.

    Why do you need to do the conversion on t1.idvisitor and t2.idvisitor? Changing the join conditions to simply

    WHERE t1.idvisitor = t2.idvisitor
    

    will help greatly. Same with t3 and t4.