I have a long string in BigQuery where that I need to extract out some data.
Part of the string looks like this:
... source: "agent" resolved_query: "hi" score: 0.61254 parameters ...
I want to extract out data such as agent
, hi
, and 0.61254
.
I'm trying to use regexp_extract but I can't get the regexp to work correctly:
select
regexp_extract([col],r'score: [0-9]*\.[0-9]+') as score,
regexp_extract([col],r'source: [^"]*') as source
from [table]
What should the regexp be to just get agent
or 0.61254
without the field name and no quotation marks?
Thank you in advance.
You can use
select
regexp_extract([col],r'score:\s*(\d*\.?\d+)') as score,
regexp_extract([col],r'resolved_query:\s*"([^"]*)"') as resolved_query,
regexp_extract([col],r'source:\s*"([^"]*)"') as source
from [table]
Here,
score:\s*(\d*\.?\d+)
matches score:
string, then any zero or more whitespaces, and then there is a capturing group with ID=1 that captures zero or more digits, an optional .
and then one or more digitsresolved_query:\s*"([^"]*)"
matches a resolved_query:
string, zero or more whitespaces, "
, then captures into Group 1 any zero or more chars other than "
and then matches a "
charsource:\s*"([^"]*)"
matches a source:
string, zero or more whitespaces, "
, then captures into Group 1 any zero or more chars other than "
and then matches a "
char.