I am having issues running a insert query on 2 large tables. One table is 67,000,000 and the other is 100,000. I am trying to do a LEFT and RIGHT Join on the 2 tables and put the results into another table. The query runs perfect on smaller tables under 1M entries. but when getting to the higher entries it bombs out. I get this error :
Incorrect key file for table 'C:\Windows\TEMP\#sql3838_2_6.MYI'; try to repair it
After reading the solutions online they say to increase the memory utilized by mysql and it's keys for indexing. I have tried this and I am still getting the same issue. I am not sure at this point if it's a bad configuration for mysql or a bar written query.
So I am really looking for a solution of optimizing my query so that it is more memory efficient or a change to my.config to handle the query. Or Splitting the query into 2 different inserts??? would that help?
MySQL Query
INSERT INTO schema.orphan_results (_Doc_ID, Orphan_Entries, Entries_Table, Orphan_File)
SELECT C.A__Doc_ID, C.A_File, C.A_Table, C.B_File
FROM( SELECT A._Doc_ID AS A__Doc_ID, A.File AS A_File, A.Table AS A_Table, B.File AS B_File
FROM schema.Temp_Entries A
LEFT JOIN schema.temp_dir_scan B ON A.File = B.File
UNION SELECT A._Doc_ID as A__Doc_ID, A.File AS A_File, A.Table AS A_Table, B.File AS B_File
FROM schema.Temp_Entries A
RIGHT JOIN schema.temp_dir_scan B ON A.File = B.File) C
WHERE C.A_File IS NULL OR C.B_File IS NULL
Here is my.config for MySql
default-storage-engine=INNODB
max_connections=800
query_cache_size=186M
table_cache=1520
tmp_table_size=900M
thread_cache_size=38
myisam_max_sort_file_size=100G
myisam_sort_buffer_size=268M
key_buffer_size=1160M
read_buffer_size=128K
read_rnd_buffer_size=512K
sort_buffer_size=512K
innodb_additional_mem_pool_size=96M
innodb_buffer_pool_size=563M
My System
16 Gigs of Mem
52 Gigs of Free disk space.
The error message usually results from low disc space, but since 52gigs should be enough (and I assume your filesystem can handle >2gb files) it might be something different.
The following 2 things should work to limit the required temporary space:
You should create an index for temp_dir_scan.File and Temp_Entries.File.
You should use "union all" instead of "union" (or, as you proposed, split the query).
And you could rewrite your code (still, create the index, please):
INSERT INTO schema.orphan_results (_Doc_ID, Orphan_Entries, Entries_Table, Orphan_File)
SELECT A._Doc_ID, A.File, A.Table, null
FROM schema.Temp_Entries A
where not exists (select 1 from schema.temp_dir_scan B where A.File = B.File)
-- or a.file is null -- you might need that if a.file can be null
INSERT INTO schema.orphan_results (_Doc_ID, Orphan_Entries, Entries_Table, Orphan_File)
select null, null, null, B.File
from schema.temp_dir_scan B
where not exists (select 1 from schema.Temp_Entries A where A.File = B.File)
Since UNION has a built-in distinct
(though I am not sure if you are aware of that), you might want to use select distinct A._Doc_ID ...
, but if you don't really need it, don't!