Oracle - Identify rows that contain chinese characters in a CLOB column

I have an Oracle table with a description column whose type is CLOB. The column can contain text in whatever language. I'm interested in retrieving all the rows that contains at least one chinese character in that description column.

How can I do?

Some resources that I looked at:

https://stackoverflow.com/a/1831128/16797805: not working because the column I have is CLOB (got ORA-22998)
https://docs.oracle.com/database/121/ARPLS/d_lob.htm: documentation for DBMS_LOB functions to operate on CLOB. I looked at INSTR, SUBSTR and GETLENGTH functions, but didn't find a way to make them work.

Solution

You can use a regular expression function like regexp_instr(), and use ranges of characters you want to search for.

If you have data like:

ID	DESCRIPTION
1	Just ASCII
2	With Chinese 好世界 text
3	With Russian Привет, мир text

then using the ranges from this answer you can do:

select *
from your_table
where regexp_instr(description, unistr('[\2E80-\2FD5\3190-\319f\3400-\4DBF\4E00-\9FCC\F900-\FAAD]')) > 0

ID	DESCRIPTION
2	With Chinese 好世界 text

fiddle

As discussed in the various answers on the linked question, these are CJK ranges so not strictly just Chinese, and you may need to include additional extension ranges, but... might be good enough for what you're doing, or at least a starting point you can work from.

Also notice that I've used the unistr() function to convert the ranges to something that can be recognised as a regex pattern, and that doesn't need the u prefixes in the linked answer - so it takes for example \2E80-\2FD5 rather than u\2E80-u\2FD5.

You could also possibly use Oracle Text's CHINESE_LEXER or MULTI_LEXER to build indexes that let you search for Chinese text - but that isn't something I've ever looked at, so you'd need to experiment or find examples or other answers that can explore it. This documentation may or may not help...