I have a column in my dataset where I want to extract the numbers in a specific location ONLY IF the structure of the string contains #-#-#
.
How would I write this in SQL (writing in snowflake currently)?
Example:
Column_A
abc-net-met-1234-12345-431-5968
abc-eme-ejt-emdn-1948-192
jen-mdk-ent-193-102-1029398
loe-ekd-12-49-nfm
I want my expected output to be:
Column_A Number1 Number2 Number3 Number4
abc-net-met-1234-12345-431-5968 1234 12345 431 5968
abc-eme-ejt-emdn-1948-192 NULL NULL NULL NULL
jen-mdk-ent-193-102-1029398 193 102 1029398 NULL
loe-ekd-12-49-nfm NULL NULL NULL NULL
I think it would be something like this but having trouble:
CASE WHEN COLUMN_A LIKE '%#-#-#%' THEN SPLIT_PART(COLUMN_A, '-', 4) ELSE NULL END Number1
CASE WHEN COLUMN_A LIKE '%#-#-#%' THEN SPLIT_PART(COLUMN_A, '-', 5) ELSE NULL END Number2
CASE WHEN COLUMN_A LIKE '%#-#-#%' THEN SPLIT_PART(COLUMN_A, '-', 6) ELSE NULL END Number3
CASE WHEN COLUMN_A LIKE '%#-#-#%' THEN SPLIT_PART(COLUMN_A, '-', 7) ELSE NULL END Number4
I think that your like
expression is too broad. You can narrow it with a regex match, like:
CASE
WHEN COLUMN_A RLIKE '.*(^|-)[0-9]+-[0-9]+-[0-9]+($|-).*'
THEN SPLIT_PART(COLUMN_A, '-', 4)
ELSE NULL
END Number1
The regexp matches on 3 consecutive groups of digits separated by dashes, and preceeded by a dash (or located at the beginning of the string), or followed by a dash (or located at the end of the string).
This can also be spelled:
CASE
WHEN COLUMN_A RLIKE '.*(^|-)\\d+-\\d+-\\d+($|-).*'
THEN SPLIT_PART(COLUMN_A, '-', 4)
ELSE NULL
END Number1