I am working on a MySQL query that needs to go through 1+ million rows in table A and 5+ million in table B. The query selects all unique people from table A and inner joins with the sales for each person on table B. Index is set on both tables where necessary.
The goal here is to add all unique Email addresses from table A with data from table B into table C.
I am looking for the most optimized way to do this. Included is the select part of a REPLACE INTO query being used that will pull records in the specified id's. It is a simple CASE statement with an IN condition.
I also have the exact same query using NOT IN vs IN for the sub query. That one times out.
Looking forward to any help anyone can provide and hopefully a more optimal way of doing this.
SELECT
c.Email,
MAX(c.Birthdate)
UPPER(c.Deleted) AS Deleted,
UPPER(c.Inactive) AS Inactive,
UPPER(c.SendEmail) AS SendEmail,
CONCAT('[',GROUP_CONCAT(DISTINCT c.Site SEPARATOR ']['),']') AS SITEID,
CONCAT('[',GROUP_CONCAT(DISTINCT c.Studio SEPARATOR ']['),']') AS STUDIO,
( SELECT CONCAT('[',GROUP_CONCAT(DISTINCT s2.Service SEPARATOR ']['),']')
FROM my_example c2
INNER JOIN my_example_sales s2 ON s2.idMember = c2.idMember AND s2.Site = c2.Site WHERE c2.Email = c.Email
) as SERVICES,
( SELECT MAX(date(s1.Date)) as Date
FROM my_example_sales s1
WHERE s1.idMember = c.idMember
AND s1.Site = c.Site
AND (CASE WHEN s1.Site = '1' THEN s1.ProductID IN ('1','6','7','12','18','22')
WHEN s1.Site = '2' THEN s1.ProductID = '156'
WHEN s1.Site = '3' THEN s1.ProductID IN ('3','5','6')
WHEN s1.Site = '4' THEN s1.ProductID IN ('11','15')
WHEN s1.Site = '5' THEN s1.ProductID = '23'
WHEN s1.Site = '6' THEN s1.ProductID = '23'
WHEN s1.Site = '7' THEN s1.ProductID = '23'
WHEN s1.Site = '8' THEN s1.ProductID = '23'
WHEN s1.Site = '9' THEN s1.ProductID = '23'
WHEN s1.Site = '10' THEN s1.ProductID IN ('7','11','17','30','31')
WHEN s1.Site = '11' THEN s1.ProductID = '23'
WHEN s1.Site = '12' THEN s1.ProductID IN ('7','11','17','30','31')
WHEN s1.Site = '13' THEN s1.ProductID = '23' END)
WHERE 1
ORDER BY s1.Date DESC
LIMIT 0,1
) as lastPurchaseFreeWeek,
NOW() as dateModified
FROM my_example c
WHERE c.Email !=''
GROUP BY c.Email
ORDER BY c.ModDate DESC
Moving your subselects to joins should yield some performance increase:
SELECT
c.Email,
MAX(c.Birthdate)
UPPER(c.Deleted) AS Deleted,
UPPER(c.Inactive) AS Inactive,
UPPER(c.SendEmail) AS SendEmail,
CONCAT('[',GROUP_CONCAT(DISTINCT c.Site SEPARATOR ']['),']') AS SITEID,
CONCAT('[',GROUP_CONCAT(DISTINCT c.Studio SEPARATOR ']['),']') AS STUDIO,
c2.SERVICES,
s1.`Date` as lastPurchaseFreeWeek,
NOW() as dateModified
FROM my_example c
INNER JOIN (
SELECT CONCAT('[',GROUP_CONCAT(DISTINCT s2.Service SEPARATOR ']['),']') AS SERVICES
FROM my_example c2
INNER JOIN my_example_sales s2 ON s2.idMember = c2.idMember AND s2.Site = c2.Site
) c2 ON c2.Email = c.Email
INNER JOIN (
SELECT MAX(date(s1.Date)) as `Date`
FROM my_example_sales s1
WHERE (CASE WHEN s1.Site = '1' THEN s1.ProductID IN ('1','6','7','12','18','22')
WHEN s1.Site = '2' THEN s1.ProductID = '156'
WHEN s1.Site = '3' THEN s1.ProductID IN ('3','5','6')
WHEN s1.Site = '4' THEN s1.ProductID IN ('11','15')
WHEN s1.Site = '5' THEN s1.ProductID = '23'
WHEN s1.Site = '6' THEN s1.ProductID = '23'
WHEN s1.Site = '7' THEN s1.ProductID = '23'
WHEN s1.Site = '8' THEN s1.ProductID = '23'
WHEN s1.Site = '9' THEN s1.ProductID = '23'
WHEN s1.Site = '10' THEN s1.ProductID IN ('7','11','17','30','31')
WHEN s1.Site = '11' THEN s1.ProductID = '23'
WHEN s1.Site = '12' THEN s1.ProductID IN ('7','11','17','30','31')
WHEN s1.Site = '13' THEN s1.ProductID = '23' END)
) s1 ON ( s1.idMember = c.idMember AND s1.Site = c.Site )
WHERE c.Email !=''
GROUP BY c.Email
ORDER BY c.ModDate DESC