Search code examples
mysqlsqlutf-8asciiutf8mb4

SQL how to SELECT all utf8mb4 characters?


I have this query:

SELECT count(*) from TABLE WHERE LENGTH(COLUMN) !=CHAR_LENGTH(COLUMN);

If count returns a value more than zero it tell me that I have non-ASCII characters in some row.

How I can know if I have utf8mb4 characters in the TABLE?

Is there a way to query all utf8mb4 characters?


Solution

  • It depends on what you mean by "utf8mb4 characters". This sentence is entirely composed of "utf8mb4 characters". This sentence is entirely composed of "ascii" characters.

    Assuming you meant "non-ASCII" and the column is CHARACTER SET utf8mb4, then your query should work fine.

    This technique works for any of the multi-byte character sets, such as utf8, big5, etc. It does not work for single-byte character sets such as latin1, latin5, etc.

    If you want to extract the non-ascii bytes from the column, that would be better done in some application language. It might have a straightforward way of doing that, or you could fetch the HEX and look for some pair of hex with regexp [CDEF].

    If you meant "utf8mb4" but not "utf8", then the hex would be F. And the row can be discovered via

    HEX(col) RLIKE "^(..)*F."