I want to match filter results exactly using CHARACTER SET utf8mb4.
ALTER DATABASE test CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
CREATE TABLE users (
username VARCHAR(25) NOT NULL,
password VARCHAR(25) NULL NULL
)
ENGINE=innodb DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_general_ci ROW_FORMAT=DYNAMIC;
CREATE INDEX users_username_idx ON users (username);
I inserted 1 million rows also containing below rows.
INSERT INTO users SELECT 'üsasdasd','somepassword';
INSERT INTO users SELECT 'usasdasd','somepassword';
INSERT INTO users SELECT 'pğasdasdasd','somepassword';
INSERT INTO users SELECT 'anfüs','somepassword';
INSERT INTO users SELECT 'anfus','somepassword';
Then applied some queries.
SELECT * FROM users WHERE username LIKE 'üs%';
SELECT * FROM users WHERE username LIKE 'us%';
SELECT * FROM users WHERE username LIKE 'pğ%';
SELECT * FROM users WHERE username = 'anfüs';
SELECT * FROM users WHERE username = 'anfus';
Why do i get results that ALSO have ü letter(like üsasdasd or anfüs) when i filter as "LIKE 'us%'"
or " = 'anfus'"
OR
Why do i get results that ALSO have u letter(like usasdasd or anfus) when i filter as "LIKE 'üs%'"
or " = 'anfüs'"
?
How can i get results that ONLY have ü letter(like üsasdasd or anfüs) when i filter as "LIKE 'üs%'"
or " = 'anfüs'"
vice versa ?
ANSWER:
I dont know why but, COLLATE=utf8mb4_0900_as_cs fixed the issue.
For passwords, you want to ignore all case folding, accent stripping, etc, so use COLLATE utf8mb4_bin
.
I suspect utf8mb4_0900_as_cs
is not quite the same as _bin
.
On the other hand, do you really want to store "plaintext" passwords? A common technique is to store a hash of the pwd and do the hash when testing. A simple hash that is secure from casual hackers:
MD5(CONCAT(password, 'my secret salt'))