Search code examples
sqlpostgresqlregex-lookarounds

Query Postgresql database to validate words order


Suppose you have a string such as 'The quick brown dog jumped over the lazy fox' in a column. How can I query if two words are present in the string, but in specific order?

For instance, if I want to retrieve all records that include fox and dog in that specific order. If we have the following strings, we just should get just the second record, not the first record:

The quick brown dog jumped over the lazy fox (should not be retrieved)
The quick brown fox jumped over the lazy dog (retrieved due to order)

A regular query does not keep the order of searching for fox first and then for dog:

SELECT d.text
from docs d
where lower(d.text) ~ '\yfox\y' and
      lower(d.text) ~ '\ydog\y'

How can I keep the precedence in the queried words? I was considering to use lookahead or lookbehind, but the word can be several words apart from each other.

Thanks


Solution

  • You can use

    where d.text ~* '\yfox\y.*\ydog\y'
    

    Details

    • \yfox\y - a whole word fox (\y is an equivalent of \b in PCRE and most common NFA regex engines, you may use it in many online regex testers)
    • .* - any 0 or more chars as many as possible
    • \ydog\y - a whole word dog.

    The ~* operator enables case insensitive matching.