Search code examples
mysqlcharacter-encodingcollationutf8mb4

When to use utf8mb4 (bin, general_ci, unicode_520_ci)?


I'm confused about the MySQL Collations and its Encodings, People usualy uses one of these three collations:

  1. utf8mb_bin
  2. utf8mb4_general_ci
  3. utf8mb4_unicode_520_ci

What I don't understand is when to use each of these collations when needed, Like for example,

A table for names like this

[id - name]

It would only has names characters from different languages like french, german, latin. . .

Do I use for such a table the collation of utf8mb_bin or stick with utf8mb4_unicode_520_ci,

In other hand, A table for topics of a blog for example:

[id - title - subject]

Do I put all the columns collation to utf8mb4_unicode_520_ci or use:

utf8mb_bin for title

utf8mb4_unicode_520_ci for subject

Since as I understood, the utf8mb4_unicode_520_ci has some emotes in it that would be used in blogs subjects, Or do I just ignore everything and just use utf8mb4_unicode_520_ci on all?

But overall, What is the point of using these different collations? And How does they affect my results in SELECT queries?

What I would like to know in berif is:

What collation to be used for each of:

  1. names
  2. titles
  3. subjects
  4. emails
  5. bios
  6. messages
  7. usernames

Solution

  • You're confusing encoding and collation.

    The available characters are defined by the encoding (and only the encoding). Since UTF-8 is a Unicode-compatible encoding, you have all characters. The MySQL peculiarity is that its utf8 encoding does not really implement UTF-8 but only a subset because it allocates 3 bytes per character and (as of today) some characters need 4 bytes. Thus utf8mb4 was born.

    Collation is a set of rules that tell you how WHERE foo = bar and ORDER BY foo work. You need to ask yourself: if I search for internet, should it match Internet? If you store French, German and Latin words you most likely don't want a binary collation. Ideally you want one with the exact rules of the language you'll be using but, since you're mixing languages, you'll have to opt for a generic collation. You can make an informed decision after reading Difference between utf8mb4_unicode_ci and utf8mb4_unicode_520_ci collations.