Search code examples
sqlstringsplitsql-likesnowflake-cloud-data-platform

Extract if string contains numbers pattern


I have a column in my dataset where I want to extract the numbers in a specific location ONLY IF the structure of the string contains #-#-#.

How would I write this in SQL (writing in snowflake currently)?

Example:

Column_A
abc-net-met-1234-12345-431-5968
abc-eme-ejt-emdn-1948-192
jen-mdk-ent-193-102-1029398
loe-ekd-12-49-nfm

I want my expected output to be:

Column_A                            Number1    Number2    Number3   Number4
abc-net-met-1234-12345-431-5968        1234      12345        431      5968
abc-eme-ejt-emdn-1948-192              NULL       NULL       NULL      NULL            
jen-mdk-ent-193-102-1029398             193        102    1029398      NULL
loe-ekd-12-49-nfm                      NULL       NULL       NULL      NULL

I think it would be something like this but having trouble:

CASE WHEN COLUMN_A LIKE '%#-#-#%' THEN SPLIT_PART(COLUMN_A, '-', 4) ELSE NULL END Number1
CASE WHEN COLUMN_A LIKE '%#-#-#%' THEN SPLIT_PART(COLUMN_A, '-', 5) ELSE NULL END Number2
CASE WHEN COLUMN_A LIKE '%#-#-#%' THEN SPLIT_PART(COLUMN_A, '-', 6) ELSE NULL END Number3
CASE WHEN COLUMN_A LIKE '%#-#-#%' THEN SPLIT_PART(COLUMN_A, '-', 7) ELSE NULL END Number4

Solution

  • I think that your like expression is too broad. You can narrow it with a regex match, like:

    CASE 
        WHEN COLUMN_A RLIKE '.*(^|-)[0-9]+-[0-9]+-[0-9]+($|-).*' 
        THEN SPLIT_PART(COLUMN_A, '-', 4) 
        ELSE NULL 
    END Number1
    

    The regexp matches on 3 consecutive groups of digits separated by dashes, and preceeded by a dash (or located at the beginning of the string), or followed by a dash (or located at the end of the string).

    This can also be spelled:

    CASE 
        WHEN COLUMN_A RLIKE '.*(^|-)\\d+-\\d+-\\d+($|-).*' 
        THEN SPLIT_PART(COLUMN_A, '-', 4) 
        ELSE NULL 
    END Number1