Search code examples
mysqldouble-byte

MySQL: char_length(), wrong value for Russian


I am using char_length() to measure the size of "Русский": strangely, instead of telling me that it's 7 chars, it tells me there are 14. Interestingly if the query is simply...

SELECT CHAR_LENGTH('Русский')

...the answer is correct. However if I query the DB instead, the anser is 14:

SELECT CHAR_LENGTH(text) FROM locales WHERE lang = 'ru-RU' AND name = 'lang_name'

Anybody go any ideas what I might be doing wrong? I can confirm that the collation is utf8_general_ci and the table is MyISAM

Thanks, Adrien

EDIT: My end objective is to be able to measure the lengths of records in a table containing single and double-byte chracters (eg. English & Russian, but not limited to these two languages only)


Solution

  • Because of two bytes is used for each UTF8 char. See http://dev.mysql.com/doc/refman/5.5/en/string-functions.html#function_char-length

    mysql> set names utf8;
    mysql> SELECT CHAR_LENGTH('Русский'); result - 7
    mysql> SELECT CHAR_LENGTH('test'); result  - 4
    
    create table test123 (
    text VARCHAR(255) NOT NULL DEFAULT '',
    text_text TEXT) Engine=Innodb default charset=UTF8;
    
    insert into test123 VALUES('русский','test русский');
    
    SELECT CHAR_LENGTH(text),CHAR_LENGTH(text_text) from test123; result - 7 and 12
    

    I have tested work with: set names koi8r; create table and so on and got invalid result. So the solution is recreate table and insert all data after setting set names UTF8.