Search code examples
mysqldatabasecharacter-setcharset

utf-8 vs latin1


What are the advantages/disadvantages between using utf8 as a charset against using latin1?

If UTF can support more chars and is used consistently wouldn't it always be the better choice? Is there any reason to choose latin1?


Solution

  • latin1 has the advantage that it is a single-byte encoding, therefore it can store more characters in the same amount of storage space because the length of string data types in MySql is dependent on the encoding. The manual states that

    To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multibyte characters. In particular, when using a utf8 Unicode character set, you must keep in mind that not all characters use the same number of bytes. utf8mb3 and utf8mb4 character sets can require up to three and four bytes per character, respectively. For a breakdown of the storage used for different categories of utf8mb3 or utf8mb4 characters, see Section 10.9, “Unicode Support”.

    Furthermore lots of string operations (such as taking substrings and collation-dependent compares) are faster with single-byte encodings.

    In any case, latin1 is not a serious contender if you care about internationalization at all. It can be an appropriate choice when you will be storing known safe values (such as percent-encoded URLs).