I tried to extract chinese characters from a string using clickhouse SQL.
I use:
select extractAll('dkfdfjsd1234中文字符串', '[a-zA-Z]')
It could successfully returns:
['d','k','f','d','f','j','s','d']
Now I want to extract chinese like that, I tried:
select extractAll('dkfdfjsd1234中文字符串', '[\u4e00-\u9fa5]')
It returns error.
Code: 427, e.displayText() = DB::Exception: OptimizedRegularExpression: cannot compile re2: [\u4e00-\u9fa5], error: invalid escape sequence: \u. Look at https://github.com/google/re2/wiki/Syntax for reference. Please note that if you specify regex as an SQL string literal, the slashes have to be additionally escaped. For example, to match an opening brace, write '\(' -- the first slash is for SQL and the second one is for regex (version 20.8.14.4 (official build))
To match Unicode point use \x{FFFF}:
SELECT extractAll('dkfdfjsd1234中文字符串', '[\\x{4e00}-\\x{9fa5}]') AS result
/*
┌─result─────────────────────┐
│ ['中','文','字','符','串'] │
└────────────────────────────┘
*/