Search code examples
mysqlsqlgoogle-cloud-sql

Google Cloud SQL selecting rows that don't exist within another table


I have a need to audit some data, however I'm running into a roadblock with a query not returning. What can I do differently here as I truly don't understand whey these queries are returning (executed from mysql workbench) even after letting them run for multiple hours? Am i just under provisioned?

I have a n1-standard 4 google cloud sql instance (4cpu, 15GB ram). The two tables are below. There also exists an index on customer_id for table one. Table 2 has 885481 rows and Table 1 has 1891653 rows.

I've tried three variants of a query to look for customer ids in table one where they don't exist in table 2 (represented as account_group_id).

What I would have expected to be most performant and actually return:

Select customer_id
FROM Table1 as a
WHERE NOT EXISTS(
    Select account_group_id
    FROM Table2 as b WHERE b.account_group_id = a.customer_id
)

As a sub query:

Select customer_id
FROM Table1
WHERE customer_id NOT IN(
    Select account_group_id
    FROM Table2
)

As a left join:

SELECT customer_id
FROM Table1 as a
LEFT OUTER JOIN Table2 as b ON a.customer_id = b.account_group_id
WHERE b.account_group_id IS NULL

EDIT: So after some tinkering and actually using EXPLAIN before posting my question, the table2 subquery is performing a FULL TABLE scan for some reason. I've tried this query on my test / staging environments with the same index pattens and I'm seeing an index seek there. Now, I'm even more confused. full table scan

Even if I add a force index hint the query optimizer refuses to use the primary key.

This is what the query plan looks like on my staging environment:

enter image description here

Any thoughts as to why this is occurring?

Table1:

Table 1

Table 2:

Table 2


Solution

  • Well after much tinkering I completely reworked this query to make the silly optimizer use the index I wanted...must be something to do with the size of the tables:

    SELECT a.customer_id
    FROM Table1 as a
    WHERE a.customer_id NOT IN (
        SELECT b.customer_id
        FROM Table1 as b
        JOIN (select account_group_id from Table2) as x on x.account_group_id = b.customer_id
    )