I'm confused about the MySQL Collations and its Encodings, People usualy uses one of these three collations:
What I don't understand is when to use each of these collations when needed, Like for example,
A table for names like this
[id - name]
It would only has names characters from different languages like french, german, latin. . .
Do I use for such a table the collation of utf8mb_bin
or stick with utf8mb4_unicode_520_ci
,
In other hand, A table for topics of a blog for example:
[id - title - subject]
Do I put all the columns collation to utf8mb4_unicode_520_ci
or use:
utf8mb_bin
for title
utf8mb4_unicode_520_ci
for subject
Since as I understood, the utf8mb4_unicode_520_ci
has some emotes in it that would be used in blogs subjects, Or do I just ignore everything and just use utf8mb4_unicode_520_ci
on all?
But overall, What is the point of using these different collations? And How does they affect my results in SELECT
queries?
What I would like to know in berif is:
What collation to be used for each of:
You're confusing encoding and collation.
The available characters are defined by the encoding (and only the encoding). Since UTF-8 is a Unicode-compatible encoding, you have all characters. The MySQL peculiarity is that its utf8
encoding does not really implement UTF-8 but only a subset because it allocates 3 bytes per character and (as of today) some characters need 4 bytes. Thus utf8mb4
was born.
Collation is a set of rules that tell you how WHERE foo = bar
and ORDER BY foo
work. You need to ask yourself: if I search for internet
, should it match Internet
? If you store French, German and Latin words you most likely don't want a binary collation. Ideally you want one with the exact rules of the language you'll be using but, since you're mixing languages, you'll have to opt for a generic collation. You can make an informed decision after reading Difference between utf8mb4_unicode_ci and utf8mb4_unicode_520_ci collations.