I have dumped my small MySQL table (manually reduced to localize the problem) to show it here:
SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
SET time_zone = "+00:00";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8mb4 */;
CREATE TABLE `symb` (
`smb` varchar(200) NOT NULL,
`trtmnt` varchar(200) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `symb` (`smb`, `trtmnt`) VALUES
('і', 'ty'),
('ї', 'hr');
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
If you create the MySQL table above and run this query
select * from symb where smb = 'ї';
or this one (queries are different - please see the symbols 'ї' vs 'і')
select * from symb where smb = 'і';
then you may see you get two rows selected instead of one as I would expect.
To reemphasize, these two select queries above are different - the symbol 'ї' is different from 'і' (both are cyrillic symbols, 'і' is NOT latin here).
Collation chosen was utf8_general_ci
Any reasons why 'і' and 'ї' are treated as the same symbols and what's the proper way to make it different? I need to select the exact row, not two.
Queries above were tested in phpMyAdmin and HeidiSQL which means that's MySQL (collation?) issue, not the program used to run queries. Each different symbol should be treated as a different symbol and the table should be case sensitive. What's wrong with the table above? As result I'm unable to set unique key for this row.
Thank you.
Just added based on comments: What does SHOW TABLE STATUS LIKE 'symb' show you? It shows me:
Name symb
Engine InnoDB
Version 10
Row_format Compact
Rows 2
Avg_row_length 8192
Data_length 16384
Max_data_length 0
Index_length 0
Data_free 0
Auto_increment NULL
Create_time 22.05.16 12:11
Update_time NULL
Check_time NULL
Collation utf8_general_ci
Checksum NULL
Create_options
Comment
Because your SELECT
statement is returning both records, it appears that your data has already been encoded wrongly into UTF-8. So merely changing the encoding of the smb
column from Latin1 to UTF-8 won't work. One option for you would be to dump the database to binary, and then reimport it as UTF-8:
mysqldump --add-drop-table your_database | replace CHARSET=latin1 CHARSET=utf8 |
iconv -f latin1 -t utf8 | mysql your_database