Using regex in Postgres for special characters

I'm using in Postgres 17.1 on Ubuntu.

I want to understand the regex for characters - what is the difference between:

1. delete from tablea where  col~'^[\u2611-\uffffff]+$'
2. delete from tablea where  col~'^[\2611-\uffffff]+$'

I meant to delete some rows with symbols and non English with (1) but I used reg ex #2. Did I delete something important?

And when to use:

 \\unnnn \\xnn \\nnnn

and what are the ranges for each?

Which of those is ASCII and which is Unicode?

Solution

See official documentation on escape sequences in regexes.
For quick experiments, try SELECT substring(E'put string here' from 'put regex here');

`\NNN` vs. `\xNN` vs. `\uNNNN`

\47 and \047 stand for a char with octal value 47_oct = 4⋅8¹ + 7⋅8⁰ = 39
\x27, \x027, \x0027, and so on stands for a char with hexadecimal value 27_hex = 2⋅16¹ + 7⋅16⁰ = 39
In this example, they stand for the char ' and can be used interchangeably.

\u0053 is just like \x53, \x053, or \x0053, but uses exactly 4 hexadecimal digits. This may be beneficial when followed by literal characters.
Example: You want a string for 'A. To circumvent issues with the '-character inside a 'single-quoted string', you cannot write \x27A or \x0027A, as the A would be interpreted as part of the number in both cases. Instead, you can write \u0027A.

I used `\2611` instead of `\u2611`. Did i delete something important?

Yes, probably.

Similar to \u... octal escape sequences only work for two and three digits. \2611 is \261 followed by a literal 1.
Also, in both versions, you wrote \uffffff which is \xffff followed by a literal ff.

This means you basically ran

delete from tablea where col~'^([1-\uffff]|\261|f)+$'

The meaning of \261 depends on your database encoding (check SHOW SERVER_ENCODING;). If UTF-8 is used, it is U+00B1 ±. But that should not be your problem.

With 1-\uffff (includes all ASCII symbols, except for control characters and !"#$%&'()*+,-./0) I'm pretty sure you deleted something you wanted to keep.

Using regex in Postgres for special characters

\NNN vs. \xNN vs. \uNNNN

I used \2611 instead of \u2611. Did i delete something important?

`\NNN` vs. `\xNN` vs. `\uNNNN`

I used `\2611` instead of `\u2611`. Did i delete something important?