Search code examples
mysqlcollate

MySQL Collate Query Running Forever


I have a particular query in MySQL where it compares values from two different table (indexed) using collate but this does not execute for hours. The query is provided below:

create temporary table elig_temp
select id from table_elig;
create index elig_temp on elig_temp(id);

create temporary table med_temp
select id from table_med;
create index med_temp on med_temp(id);

select COUNT(1) as result 
from med_temp a 
where exists 
      (
       select 1 
       from elig_temp b 
       where a.id collate latin1_general_cs = b.id collate latin1_general_cs
      )

FYI The elig_temp table has 70k records whereas the med_temp has 1 million records.
Also, the id fields of table_elig and table_med tables are the hash encrypted values of another field from the same table. Therefore, I tried to use binary collations techniques too, such as udf8_bin and latin1_bin to make the query run but I am stuck again.

I have even tried by defining with the same collation techniques, that I used with query, for each fields (varchar and char) of table_med and table_elig but no luck.

Please suggest me with any possible solution for executing this query time efficiently.


Solution

  • When you explicitly set a collate, then MySQL cannot use indexes on the column. This is unfortunate.

    First, does the query run without collations?

    select COUNT(1) as result 
    from med_temp a 
    where exists (select 1 from elig_temp b where a.id  = b.id collate );
    

    That is the simplest solution.

    The next solution is to make the collations the same when you create the tables:

    create temporary table elig_temp
        select id collate latin1_general_cs as id
        from table_elig;
    create index elig_temp on elig_temp(id);
    
    create temporary table med_temp
        select id collate latin1_general_cs as id
        from table_med;
    create index med_temp on med_temp(id);
    

    Then you can run the above query -- without the explicit collations -- and it should use the indexes.