Search code examples
sqlpostgresqlcase-when

Postgres: select using column as regex, when only some rows are valid regexes


TLDR

How to do a regex-match query with a column value ('input' ~ t.somecolumn), where just a known subset of rows has a valid regex in that column?

Full example

  • there is a blocked_items table including two varchar columns: type and value,
  • one of the types is DOMAIN_REGEX, and then the value always includes a correct regex,
  • but: for other types value doesn't need to be a regex and can cause errors when treated as one.

To check if a domain is blocked, I'm calling this query and passing the URL in question as $1 parameter:

SELECT 1 FROM blocked_items WHERE type = 'DOMAIN_REGEX' AND $1 ~ value LIMIT 1

The problem: on some database instances the query fails if rows with another type have value that's not a valid regex. On one database this query runs correctly, and on another instance, regardless of the input, throws: invalid regular expression: quantifier operand invalid.

Example test data:

| type         | value               |
|--------------+---------------------|
| EMAIL        | [email protected]   |
| DOMAIN_REGEX | test\d\.com         |

Question

I know the reason for my error is that the db engine can choose to check the second condition ($1 ~ value) first -- I've checked the EXPLAIN for my query and indeed it's different on these two database instances.

Is there a way I can

  • force the db to check the type column first, so the regex filter is always valid?
  • form the query differently to ignore the error for non-regex value entries? Or check if it's a valid regex first?
  • work around this issue in another way?

// I know changing the schema or using LIKE instead will probably suffice, but now that I stumbled upon this I'm curious if there is a solution using regexes like this :)


Solution

  • You are right, the schema is not great. If you still really have to keep the schema, you could try CASE/WHEN, https://www.postgresqltutorial.com/postgresql-case/