Search code examples
sqlregexsnowflake-cloud-data-platformrlike

Exclude certain strings while including others using RLIKE


I'm working on a Snowflake query that uses the RLIKE function to find certain words in a block of text. This is a simplified version of my query:

SELECT id FROM table WHERE RLIKE (text,'.* red .*|.* green .*|.* blue .*','i')

Some examples of the table that I'm querying:

Row 1: id = 1, text = 'the table is red and nice'

Row 2: id = 2, text = 'there's a red rose in the garden'

Row 3: id = 3, text = 'I see a red rose in the green garden'

At this point, the query will return all 3 rows from the table. What I'm trying to do is exclude a certain word combination from the results. In this example I want to exclude 'red rose' from the results but only if there's no other word from the RLIKE list in the text.

So in my results, ID 1 should be included ('red' on itself has to be included), ID 2 should be excluded ('red rose' combination and no other colors in the text) and ID 3 should be included (even though it has 'red rose', it also has 'green', so should be included).

Is there any way to do this? Thanks in advance for your help

EDIT: just to clarify, I would also like to see things like 'the red rose in the red garden' in my results, because 'red' is in there once, without being next to 'rose'


Solution

  • Hmmm . . . the following may suffice for what you want and it doesn't use regular expressions:

    SELECT id
    FROM table
    WHERE text like '% green %' or
          text like '% blue %' or
          (text like '% red %' and text not like '% red rose %');
    

    This works for your sample data. But it would not return "red rose in the red garden". I'm not sure if you want that or not.

    If you want that row to appear as well, then one method is:

    SELECT id
    FROM table
    WHERE text like '% green %' or
          text like '% blue %' or
          replace(text, 'red rose', '') like '% red %' ;
    

    You could use rlike() with the same logic:

    WHERE rlike(replace(text, 'red rose', ''), ' red | green | blue ')