Search code examples
mysqldjangocharacter-encoding

Django query not considers special characters


I've come across a problem with special characters on a django query (Django 1.9.2).

I've created a model that stores a word in it, and I'm feeding that model with words from a Spanish dictionary, using code as follows:

MyModel.objects.get_or_create(word=myword)

And now I've realized that words containing special characters haven't been added, so, for example, there is only one row of MyModel in the database for año and ano! And when I query the database I retrieve the same object for these two queries:

MyModel.objects.get(word='año')
MyModel.objects.get(word='ano')

...and no, those words are not the same ;D

I would want to create one object for each, of course.


Solution

  • Short answer: You probably want COLLATION utf8_spanish2_ci.

    Long answer:

    If you are using CHARACTER SET utf8 (or utf8mb4) on the column/table in question, and if you need ano != año, you need COLLATION utf8_bin or utf8_spanish_ci or utf8_spanish2_ci. All other utf8 collations treat n = ñ. spanish2 differs from spanish in that ch is treated as a separate "letter" between c and d. Similarly for ll. More details.

    Note that other 'accents' are ignored in comparisons for most utf8 collations except for utf8_bin. For example, C = Ç (except for _bin and _turkish).