Search code examples
regexgoogle-bigqueryextractregexp-replace

REGEXP_EXTRACT with String Value in Bigquery


I want to extract words in a column, the column value looks like this:'p-fr-youtube-car'. And they should all be extracted to their own column.

INPUT:

p-fr-youtube-car

DESIRED OUTPUT:

Country = fr
Channel = youtube
Item = car

I've tried below to extract the first word, but can't figure out the rest.What RegEx will achieve my desired output from this input? And how can I make it not case sensative fr and FR will be the same.

REGEXP_EXTRACT_ALL(CampaignName, r"^p-([a-z]*)") AS Country


Solution

  • You can use [^-]+ to match parts between hyphens and only capture what you need to fetch.

    To get strings like youtube, you can use

    REGEXP_EXTRACT_ALL(CampaignName, r'^p-[^-]+-([^-]+)')
    

    To get strings like car, you can use

    REGEXP_EXTRACT_ALL(CampaignName, r'^p-[^-]+-[^-]+-([^-]+)')
    

    So, [^-]+ matches one or more chars other than - and ([^-]+) is the same pattern wrapped with a capturing group whose contents REGEXP_EXTRACT actually returns as a result.