I have created a table with an UTF-8 VARCHAR(5000)
, and filled it with data.
But it looks like this field is allowing more data than it is instructed to:
mysql> DESCRIBE test;
+---------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| comment | varchar(5000) | YES | | NULL | |
+---------+------------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
mysql> SELECT MAX(LENGTH(comment)) FROM test;
+----------------------+
| MAX(LENGTH(comment)) |
+----------------------+
| 5001 |
+----------------------+
1 row in set (0.01 sec)
Why is that?
Ok, the problem is that LENGTH() returns the length in bytes, not chars. Because the string is UTF-8, I need to use CHAR_LENGTH() instead:
mysql> DESCRIBE test;
+---------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| comment | varchar(5000) | YES | | NULL | |
+---------+------------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
mysql> SELECT MAX(LENGTH(comment)) FROM test;
+----------------------+
| MAX(LENGTH(comment)) |
+----------------------+
| 5001 |
+----------------------+
1 row in set (0.01 sec)
mysql> SELECT MAX(CHAR_LENGTH(comment)) FROM test;
+----------------------+
| MAX(LENGTH(comment)) |
+----------------------+
| 5000 |
+----------------------+
1 row in set (0.01 sec)
The length was 5001 because the string contained exactly one two-byte character!