Search code examples
mysqlcollationamazon-aurorautf8mb4

AWS Aurora MySQL 5.7.12 accent sensitive collation


A legacy SQLServer db column has the collation SQL_Latin1_General_CP1_CI_AS (Case Insensitive, Accent Sensitive).

We need to migrate this column into an already created and actively used AWS Aurora MySQL instance.

MySQL 8 has the collation utf8mb4_0900_as_ci (Case Insensitive, Accent Sensitive).

AWS Aurora however currently uses MySQL 5.7.12 which does NOT have utf8mb4_0900_as_ci. MySQL 5.7.12 does have

  • utf8mb4_bin (Case Sensitive, Accent Sensitive as it is comparing binary)
  • utf8mb4_unicode_ci (Case Insensitive, haven't been able figure out if it's Accent Sensitive)
  • utf8mb4_unicode_520_ci (Case Insensitive, haven't been able figure out if it's Accent Sensitive)

Are utf8mb4_unicode_ci or utf8mb4_unicode_520_ci accent sensitive? If yes, can I just use one of them (but which?).

Otherwise, I see no other option but using utf8mb4_bin as it appears to be the only accent sensitive collation available, but because it is Case Sensitive I would have to work around the case sensitivity by eg lowercasing in WHERE clauses, but will that work without unintended undesirable side effects?


Solution

  • 900 refers to Unicode 9.0 (best, but new with 8.0)
    520 refers to Unicode 5.20 (best available in 5.7)
    _unicode_ci refers to Unicode 4.0 (old)
    _general_ci (even less sophisticated)

    _ci (without _as) means case and accent insensitive.
    _bin means case and accent sensitive; the bits are compared.
    MySQL does not provide many collations that mix the sensitivities.

    Use SHOW COLLATION; to see what collations are available in your system.