Search code examples
mysqlutf-8flask-sqlalchemyiso-8859-1utf8mb4

Mysql: When should I declare Column of type CHAR(String) to use UTF8 or Latin1?


Since Mysql>=8.0 has supported collation in UTF8MB4.

But If the Column of type CHAR is an alphanumeric string, would it be better to custom collation in UTF8 or latin1 ?


I use Flask-Sqlalchemy, and my project sets SQLALCHEMY_DATABASE_URI = 'mysql+mysqldb://root:@localhost:3306/testdb?charset=utf8mb4'

But After upgrade mysql to 8.0 , all tables are created with collation of UTF8MB4 .

eg:

class Topic(db.Model, CoModel):
    id = db.Column(db.Integer, primary_key=True, autoincrement=True)
    name = db.Column(db.String(168))
    content = db.Column(db.Text)

==> mysql

CREATE TABLE `topic` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(168) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `content` text COLLATE utf8mb4_general_ci ,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

Should I custom name = db.Column(db.String(168, collation="utf8_general_ci"))


UTF8MB4 is good at support huge number of character encoding.

Should I configure utf8mb4 by default and use it everywhere ?


Solution

  • Going forward, you should use utf8mb4 for almost all CHAR/VARCHAR/TEXT columns.

    CHARACTER SET utf8mb4 covers essentially all the world's character sets. If your client encodes characters as UTF-8 (the outside equivalent of utf8mb4), then utf8mb4 is good.

    Moving from utf8 to utf8mb4 is good. The former is a subset of the latter. The difference is Emoji and some Chinese.

    Mixing latin1 with utf8 or utf8mb4, is possible, but this forum is full of programmers/dbas who screw it up.

    8.0 changed the default to utf8mb4 for a lot of good reasons.

    Note MySQL's convention of xxxx_yyy_ci being a collation that applies to the character set xxxx. That is utf8_general_ci belongs with utf8, not utf8mb4.

    A "character set" is an encoding. A collation is a set of rules for comparing strings. Example: Should 'A' be treated as equal to 'a'.