Search code examples
mysqlsqlquery-optimization

how to optimize query searching for similar ids referenced in different tables


in my query here https://www.db-fiddle.com/f/nfJzZoYC5gEXLu8hrw4JT2/1

SELECT id, COUNT(DISTINCT tc1.co_id), COUNT(DISTINCT tc2.co_id)
FROM test t
INNER JOIN test_corelation_1 tc1 ON tc1.test_id = t.id AND tc1.co_id IN (
    SELECT co_id 
    FROM test_corelation_1
    WHERE test_id IN (1, 2, 5)
    GROUP BY co_id
)
INNER JOIN test_corelation_2 tc2 ON tc2.test_id = t.id AND tc2.co_id IN (
    SELECT co_id 
    FROM test_corelation_2
    WHERE test_id IN (1, 2, 5)
    GROUP BY co_id
)
GROUP BY t.id
ORDER BY (COUNT(DISTINCT tc1.co_id) + COUNT(DISTINCT tc2.co_id)) ASC;

i am trying getting all the ids from table test that shares similar ids corelated to the ids 1, 2, 3 then sorting it by the least similar by counting it which results in this

id COUNT(DISTINCT tc1.co_id) COUNT(DISTINCT tc2.co_id)
3 1 3
2 3 7
1 5 6

but it gets very very slow the more ids i am checking for its similarities and i do not know how to optimize it further from this and i thought of using CTE but it had same results in the optimizer explain


Solution

  • The main means we have to speed up queries are indexes. But of course we should ensure that our queries are as straight-forward as possible. In your case you don't need the test table at all. Just get your counts from the two other tables and join them. As you are only interested in test IDs that exist in both tables, this is a mere inner join.

    select test_id, c1.count_1, c2.count_2
    from
    (
      select test_id, count(*) as count_1
      from test_corelation_1
      where co_id in (select co_id from test_corelation_1 where test_id in (1, 2, 5))
      group by test_id
    ) c1
    join
    (
      select test_id, count(*) as count_2
      from test_corelation_2
      where co_id in (select co_id from test_corelation_2 where test_id in (1, 2, 5))
      group by test_id
    ) c2 using (test_id)
    order by c1.count_1 + c2.count_2, test_id;
    

    I recommend these indexes for the query:

    create index idx1 on test_corelation_1 (co_id, test_id);
    create index idx2 on test_corelation_2 (co_id, test_id);
    

    (In case the DBMS wants to work with indexes on (co_id, test_id), too, it already has these indexes, as these are the tables' primary keys.)