Since Mysql>=8.0 has supported collation in UTF8MB4.
But If the Column of type CHAR is an alphanumeric string, would it be better to custom collation in UTF8 or latin1 ?
I use Flask-Sqlalchemy
, and my project sets SQLALCHEMY_DATABASE_URI = 'mysql+mysqldb://root:@localhost:3306/testdb?charset=utf8mb4'
But After upgrade mysql to 8.0 , all tables are created with collation of UTF8MB4 .
eg:
class Topic(db.Model, CoModel):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
name = db.Column(db.String(168))
content = db.Column(db.Text)
==> mysql
CREATE TABLE `topic` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(168) COLLATE utf8mb4_general_ci DEFAULT NULL,
`content` text COLLATE utf8mb4_general_ci ,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
Should I custom name = db.Column(db.String(168, collation="utf8_general_ci"))
UTF8MB4 is good at support huge number of character encoding.
Should I configure utf8mb4 by default and use it everywhere ?
Going forward, you should use utf8mb4 for almost all CHAR/VARCHAR/TEXT columns.
CHARACTER SET utf8mb4
covers essentially all the world's character sets. If your client encodes characters as UTF-8
(the outside equivalent of utf8mb4), then utf8mb4 is good.
Moving from utf8
to utf8mb4
is good. The former is a subset of the latter. The difference is Emoji and some Chinese.
Mixing latin1
with utf8
or utf8mb4
, is possible, but this forum is full of programmers/dbas who screw it up.
8.0 changed the default to utf8mb4 for a lot of good reasons.
Note MySQL's convention of xxxx_yyy_ci being a collation that applies to the character set xxxx. That is utf8_general_ci belongs with utf8, not utf8mb4.
A "character set" is an encoding. A collation is a set of rules for comparing strings. Example: Should 'A'
be treated as equal to 'a'
.