I have a column called username
, I want the user to be able to insert text in japanese, roman, arabic, korean, and everything that is possible, including special chars [https://en.wiktionary.org/wiki/Index:All_languages], what COLLATE
should I set on my database and tables?
I'm using utf_general_ci
, I'm new so I don't know if this is the best COLLATE
for my needs. I need to choose the right COLLATE
to avoid sql error, because I will not use preg_replace
or a function to replace special chars, I will only use prepared statement
to avoid SQL injection and protect by database.
utf8mb4_0900_ai_ci
utf8mb4_unicode_520_ci
utf8mb4_unicode_ci
utf8_unicode_ci
The numbers refer to Unicode standards 9.0, 5.20, and (no number) 4.0.
No collation is good for sorting all languages at the same time. Spanish, German, Turkish, etc, have quirks that are incompatible. The collations above are the 'best' general purpose ones available.
utf8mb4 handles all characters yet specified by Unicode (including Cherokee, Klingon, Cuneiform, Byzantine, etc.)
If Portuguese is the focus:
See https://pt.stackoverflow.com/ and MySQL collation for Portugese .
Study this for 8.0 or this for pre 8.0 to see which utf8/utf8mb4 collation comes closest to sorting Portuguese 'correctly'. Perhaps utf8mb4_danish_ci
or utf8mb4_de_pb_0900_ai_ci
would be best.
(Else go with the 'choices' listed above.)