Search code examples
regexgoogle-bigquerystandards

Extract from string in BigQuery using regexp_extract


I have a long string in BigQuery where that I need to extract out some data.

Part of the string looks like this:

... source: "agent" resolved_query: "hi" score: 0.61254 parameters ...

I want to extract out data such as agent, hi, and 0.61254.

I'm trying to use regexp_extract but I can't get the regexp to work correctly:

select
regexp_extract([col],r'score: [0-9]*\.[0-9]+') as score,
regexp_extract([col],r'source: [^"]*') as source
from [table]

What should the regexp be to just get agent or 0.61254 without the field name and no quotation marks?

Thank you in advance.


Solution

  • You can use

    select
    regexp_extract([col],r'score:\s*(\d*\.?\d+)') as score,
    regexp_extract([col],r'resolved_query:\s*"([^"]*)"') as resolved_query,
    regexp_extract([col],r'source:\s*"([^"]*)"') as source
    from [table]
    

    Here,

    • score:\s*(\d*\.?\d+) matches score: string, then any zero or more whitespaces, and then there is a capturing group with ID=1 that captures zero or more digits, an optional . and then one or more digits
    • resolved_query:\s*"([^"]*)" matches a resolved_query: string, zero or more whitespaces, ", then captures into Group 1 any zero or more chars other than " and then matches a " char
    • source:\s*"([^"]*)" matches a source: string, zero or more whitespaces, ", then captures into Group 1 any zero or more chars other than " and then matches a " char.