A legacy SQLServer db column has the collation SQL_Latin1_General_CP1_CI_AS (Case Insensitive, Accent Sensitive).
We need to migrate this column into an already created and actively used AWS Aurora MySQL instance.
MySQL 8 has the collation utf8mb4_0900_as_ci (Case Insensitive, Accent Sensitive).
AWS Aurora however currently uses MySQL 5.7.12 which does NOT have utf8mb4_0900_as_ci. MySQL 5.7.12 does have
Are utf8mb4_unicode_ci or utf8mb4_unicode_520_ci accent sensitive? If yes, can I just use one of them (but which?).
Otherwise, I see no other option but using utf8mb4_bin as it appears to be the only accent sensitive collation available, but because it is Case Sensitive I would have to work around the case sensitivity by eg lowercasing in WHERE clauses, but will that work without unintended undesirable side effects?
900 refers to Unicode 9.0 (best, but new with 8.0)
520 refers to Unicode 5.20 (best available in 5.7)
_unicode_ci
refers to Unicode 4.0 (old)
_general_ci
(even less sophisticated)
_ci
(without _as
) means case and accent insensitive.
_bin
means case and accent sensitive; the bits are compared.
MySQL does not provide many collations that mix the sensitivities.
Use SHOW COLLATION;
to see what collations are available in your system.