Search code examples
mysqlmemory-managementinsertleft-joinright-join

MySQL Insert from results of left & Right Join results in memory issue


I am having issues running a insert query on 2 large tables. One table is 67,000,000 and the other is 100,000. I am trying to do a LEFT and RIGHT Join on the 2 tables and put the results into another table. The query runs perfect on smaller tables under 1M entries. but when getting to the higher entries it bombs out. I get this error :

Incorrect key file for table 'C:\Windows\TEMP\#sql3838_2_6.MYI'; try to repair it

After reading the solutions online they say to increase the memory utilized by mysql and it's keys for indexing. I have tried this and I am still getting the same issue. I am not sure at this point if it's a bad configuration for mysql or a bar written query.

So I am really looking for a solution of optimizing my query so that it is more memory efficient or a change to my.config to handle the query. Or Splitting the query into 2 different inserts??? would that help?

MySQL Query

INSERT INTO schema.orphan_results (_Doc_ID, Orphan_Entries, Entries_Table, Orphan_File) 
    SELECT C.A__Doc_ID, C.A_File, C.A_Table, C.B_File 
    FROM( SELECT A._Doc_ID AS A__Doc_ID, A.File AS A_File, A.Table AS A_Table, B.File AS B_File 
    FROM schema.Temp_Entries A 
    LEFT JOIN schema.temp_dir_scan B ON A.File = B.File 
    UNION SELECT A._Doc_ID as A__Doc_ID, A.File AS A_File, A.Table AS A_Table, B.File AS B_File 
    FROM schema.Temp_Entries A 
    RIGHT JOIN schema.temp_dir_scan B ON A.File = B.File) C 
WHERE C.A_File IS NULL OR C.B_File IS NULL

Here is my.config for MySql

default-storage-engine=INNODB
max_connections=800
query_cache_size=186M
table_cache=1520
tmp_table_size=900M
thread_cache_size=38
myisam_max_sort_file_size=100G
myisam_sort_buffer_size=268M
key_buffer_size=1160M
read_buffer_size=128K
read_rnd_buffer_size=512K
sort_buffer_size=512K
innodb_additional_mem_pool_size=96M
innodb_buffer_pool_size=563M

My System

16 Gigs of Mem
52 Gigs of Free disk space.

Solution

  • The error message usually results from low disc space, but since 52gigs should be enough (and I assume your filesystem can handle >2gb files) it might be something different.

    The following 2 things should work to limit the required temporary space:

    • You should create an index for temp_dir_scan.File and Temp_Entries.File.

    • You should use "union all" instead of "union" (or, as you proposed, split the query).

    And you could rewrite your code (still, create the index, please):

    INSERT INTO schema.orphan_results (_Doc_ID, Orphan_Entries, Entries_Table, Orphan_File) 
    SELECT A._Doc_ID, A.File, A.Table, null 
    FROM schema.Temp_Entries A 
    where not exists (select 1 from schema.temp_dir_scan B where A.File = B.File)   
    -- or a.file is null   -- you might need that if a.file can be null
    
    INSERT INTO schema.orphan_results (_Doc_ID, Orphan_Entries, Entries_Table, Orphan_File) 
    select null, null, null, B.File 
    from schema.temp_dir_scan B 
    where not exists (select 1 from schema.Temp_Entries A where A.File = B.File) 
    

    Since UNION has a built-in distinct (though I am not sure if you are aware of that), you might want to use select distinct A._Doc_ID ..., but if you don't really need it, don't!