Search code examples
mysqlnode.jsindexingnode-mysql

Using MySQL unique index to prevent duplicates, instead of duplicate searching?


I have a large table (5 million rows), with a unique identifier column called 'unique_id'

I'm running the INSERT query through Node.js (node-mysql bindings) and there's a chance that duplicates could be attempted to be inserted.

The two solutions are:

1) Make 'unique_id' an index, and check the entire database for a duplicate record, prior to INSERT:

'SELECT unique_id WHERE example = "'+unique_id+'" LIMIT 1'

2) Make 'unique_id' a unique index within MySQL, and perform the INSERT without checking for duplicates. Clearly, any duplicates would cause error and not be inserted into the table.

My hunch is that solution 2) is better, as it prevents a search of worse-case (5 million - 1) rows for a duplicate.

Are there any downsides to using solution 2)?


Solution

  • There is a number of advantages to defining a unique, primary index for the unique_id column:

    • Semantic correctness - currently the name does not reflect reality as you can have duplicates in column called 'unique_id',
    • Autogenerating of unique ids - you can delegate this job to the database and avoid conflict of ids (this would not be a problem if you were using UUID instead of integers),
    • Speed gain - to be a reliable solution 1 would require a blocking transaction (no new rows should be inserted between checking for duplicate and inserting a row). Delegating this to MySQL will be much more efficient,
    • Following a common pattern - this is exactly what unique and primary indexes were designed to do. Your solution will be easy to understand to other developers,
    • Less code.

    With the 2nd solution you might need to handle the attempt of inserting a duplicate (unless your unique ids are generated by MySQL).

    Autoincremented primary index: https://dev.mysql.com/doc/refman/5.7/en/example-auto-increment.html