I have a need to audit some data, however I'm running into a roadblock with a query not returning. What can I do differently here as I truly don't understand whey these queries are returning (executed from mysql workbench) even after letting them run for multiple hours? Am i just under provisioned?
I have a n1-standard 4 google cloud sql instance (4cpu, 15GB ram). The two tables are below. There also exists an index on customer_id for table one. Table 2 has 885481 rows and Table 1 has 1891653 rows.
I've tried three variants of a query to look for customer ids in table one where they don't exist in table 2 (represented as account_group_id).
What I would have expected to be most performant and actually return:
Select customer_id
FROM Table1 as a
WHERE NOT EXISTS(
Select account_group_id
FROM Table2 as b WHERE b.account_group_id = a.customer_id
)
As a sub query:
Select customer_id
FROM Table1
WHERE customer_id NOT IN(
Select account_group_id
FROM Table2
)
As a left join:
SELECT customer_id
FROM Table1 as a
LEFT OUTER JOIN Table2 as b ON a.customer_id = b.account_group_id
WHERE b.account_group_id IS NULL
EDIT: So after some tinkering and actually using EXPLAIN before posting my question, the table2 subquery is performing a FULL TABLE scan for some reason. I've tried this query on my test / staging environments with the same index pattens and I'm seeing an index seek there. Now, I'm even more confused.
Even if I add a force index hint the query optimizer refuses to use the primary key.
This is what the query plan looks like on my staging environment:
Any thoughts as to why this is occurring?
Table1:
Table 2:
Well after much tinkering I completely reworked this query to make the silly optimizer use the index I wanted...must be something to do with the size of the tables:
SELECT a.customer_id
FROM Table1 as a
WHERE a.customer_id NOT IN (
SELECT b.customer_id
FROM Table1 as b
JOIN (select account_group_id from Table2) as x on x.account_group_id = b.customer_id
)