I currently have the need to write a SQL query to determine the number of rows in my UTF8 Oracle database that are not compatible with another system that uses LATIN-1.
For example, Über
should not return a result, but 翻译
should
I have tried queries such as:
select decode(convert(convert('Über test', 'WE8ISO8859P1'), 'UTF8'), convert('Über test', 'UTF8'), 1, 0) from dual;
However, this does not give me the result that I need. Can anyone provide a SQL-only solution to this problem? Thanks
I think I have figured it out:
select * from (select asciistr(convert('test string goes here', 'UTF8')) as str from dual) where regexp_like(str, '.*\\([1-9A-F]|0[1-9A-F]).*');
Using http://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29 as a reference, the LATIN-1 block of unicode ends at \00FF.
For example,
SQL> select * from (select asciistr(convert('翻译', 'UTF8')) as str from dual) where regexp_like(str, '.*\\([1-9A-F]|0[1-9A-F]).*');
STR
------------------------------
\7FFB\8BD1
If someone could double-check this from a logical standpoint, I would appreciate it.