Search code examples
stringsubstringsnowflake-cloud-data-platformstring-matching

Regular Expression to Match string pattern followed by N digits in snowflake


I have dataset like below in table A and I want result-set where the pattern for value column is like ABC followed by any 8 digits.

Table A

Output should be like below after regex match

[Table B2

I tried to use regex_sbustr but failed to get actual result set for 8 digits pattern.

Will appreciate your help


Solution

  • And to show @JNevil's REGEXP_LIKE solution working:

    select 
        column1, 
        column2, 
        REGEXP_LIKE(column2, 'ABC[0-9]{8}.*') as match
    from values
        (1, 'ABC123456788999'),
        (2, 'ABC123458765uhfh=hh'),
        (3, 'BCA123456788999'),
        (4, 'ABC987654321'),
        (5, 'ABC876hjkl90'),
        (6, 'ABC876');
    

    gives:

    COLUMN1 COLUMN2 MATCH
    1 ABC123456788999 TRUE
    2 ABC123458765uhfh=hh TRUE
    3 BCA123456788999 FALSE
    4 ABC987654321 TRUE
    5 ABC876hjkl90 FALSE
    6 ABC876 FALSE

    thus in filtering form:

    select 
        column1, 
        column2 as value
    from values
        (1, 'ABC123456788999'),
        (2, 'ABC123458765uhfh=hh'),
        (3, 'BCA123456788999'),
        (4, 'ABC987654321'),
        (5, 'ABC876hjkl90'),
        (6, 'ABC876')
    where REGEXP_LIKE(column2, 'ABC[0-9]{8}.*'); 
    

    gives:

    COLUMN1 VALUE
    1 ABC123456788999
    2 ABC123458765uhfh=hh
    4 ABC987654321

    As per the documents, the LIKE version is automatically anchored, which means there is an implicit ^ and $ added to the begin/end of you regexp string, thus the need for the .* in this solution, otherwise none of the given input will match, as they all have 9 or more tokens.

    as seen here (with extra 8 numeric input):

    select 
        column1, 
        column2, 
        REGEXP_LIKE(column2, 'ABC[0-9]{8}') as match
    from values
        (1, 'ABC123456788999'),
        (2, 'ABC123458765uhfh=hh'),
        (3, 'BCA123456788999'),
        (4, 'ABC987654321'),
        (5, 'ABC876hjkl90'),
        (6, 'ABC876'),
        (8, 'ABC12345678')
        ;    
    
    COLUMN1 COLUMN2 MATCH
    1 ABC123456788999 FALSE
    2 ABC123458765uhfh=hh FALSE
    3 BCA123456788999 FALSE
    4 ABC987654321 FALSE
    5 ABC876hjkl90 FALSE
    6 ABC876 FALSE
    8 ABC12345678 TRUE