I have this query:
SELECT count(*) from TABLE WHERE LENGTH(COLUMN) !=CHAR_LENGTH(COLUMN);
If count returns a value more than zero it tell me that I have non-ASCII characters in some row.
How I can know if I have utf8mb4 characters in the TABLE?
Is there a way to query all utf8mb4 characters?
It depends on what you mean by "utf8mb4 characters". This sentence is entirely composed of "utf8mb4 characters". This sentence is entirely composed of "ascii" characters.
Assuming you meant "non-ASCII" and the column is CHARACTER SET utf8mb4
, then your query should work fine.
This technique works for any of the multi-byte character sets, such as utf8, big5, etc. It does not work for single-byte character sets such as latin1, latin5, etc.
If you want to extract the non-ascii bytes from the column, that would be better done in some application language. It might have a straightforward way of doing that, or you could fetch the HEX and look for some pair of hex with regexp [CDEF].
If you meant "utf8mb4" but not "utf8", then the hex would be F.
And the row can be discovered via
HEX(col) RLIKE "^(..)*F."