Search code examples
mysqltextblobinnodbquery-performance

Most Efficient way to compare large 'text' type values in MySQL using InnoDB engine


I have a staging table like this :

CREATE TABLE `staging` (
  `created_here_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `desc_text`  TEXT NOT NULL );

And destination table as :

CREATE TABLE `final_tbl` (
  `row_id` BIGINT NOT NULL AUTO_INCREMENT,
  `created_here_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `desc_text`  TEXT NOT NULL );

I want to insert desc_text into final_tbl only if it is not present. I am thinking of two options :

  1. check if staging.desc_text exists in final_tbl.desc_text, if not then insert into final_tbl
  2. Maintain a column in 'final_tbl' that will store the SHA224 value of desc_text column. compare SHA224 value of staging.desc_text with SHA224 column in final table and then decide whether to insert or ignore.

I want to know which option will be faster?


Solution

  • Hmmm . . .

    Create the SHA224 column, with an index:

    create index unq_final_tbl_sha224 on final_tbl(sha224);
    

    Then do an update like this:

    insert into final_tbl(desc_text, sha224)
        select *
        from (select desc_text, sha224
              from staging s
              where not exists (select 1 from final_tbl f where f.ssh224 = s.ssh224)
             ) s
        where not exists (select 1 from final_tbl f where f.desc_text = s.desc_text);
    

    The idea behind the subquery is to be absolutely sure that MySQL doesn't get any ideas about comparing the long form of the field before comparing the hash value. It is probably safe to use and without the subquery, but the above is more conservative.