Search code examples
mysqlpostgresqlmariadbspecial-characterscollation

Unique constraint violation on similar characters


I have a problem with primary keys in MariaDB (I expect it to be the same for MySQL). I'm building a Django app, where some important tables are created by Django with the following SQL:

CREATE TABLE table_name
(
  term     VARCHAR(64) NOT NULL PRIMARY KEY,
  enabled  TINYINT(1)  NOT NULL,
  added    DATE        NOT NULL,
  verified DATE        NOT NULL
) ENGINE = InnoDB;

If I insert two rows in this table where term is ó for the first one, and o for the other, I get the error [23000][1062] Duplicate entry 'o' for key 'PRIMARY'. Same behaviour occurs for other similar letters, like for example e and é. I really find this behaviour weird, as there is no problems with SQLite.

So now the question is: How do I make the database engine see them as different values?

Bonus question: Can I expect the same behaviour from PostgreSQL?

I have tried this in SQL, and not just Django.


Solution

  • Well, as it happens, I answered this question myself while writing it. I realized that I had not tried the combination of the character set utf8mb4 and the collation utf8mb4_bin, and I remembered that the _bin collations compare characters at the binary level, without any rules that makes similar characters seem alike. So creating my database with

    CREATE DATABASE [database_name] CHARACTER SET = 'utf8mb4' COLLATE = 'utf8mb4_bin';
    

    did the trick. Alternatively I could have created each table with

    CREATE TABLE [table_name] CHARACTER SET = 'utf8mb4' COLLATE = 'utf8mb4_bin';
    

    , but setting it on the database is the only viable option here, since Django takes care of table creation.

    Update:

    Regarding the bonus question, the short answer is: No.

    For details, see nick-barnes comment below.

    Update 2:

    Microsoft SQL server (tested on SQL Server 2017 CU3) uses collation to compare strings for equality, so a binary collation, like for instance Latin1_General_BIN, need to be used to address this issue on this server.