How can I extract all text between single full stops (i.e. ignore ...
) with a re2 regex?
I am using the REGEXP_EXTRACT_ALL
function in BigQuery which uses https://github.com/google/re2/wiki/Syntax.
From the following example:
This is... a.. sentence. It is just an example.
I would like the query to extract
This is... a.. sentence.
and It is just an example.
I am specifically interested in whether this is possible using SQL functions in BigQuery rather than introducing another tool
Consider below workaround
select text, regexp_replace(sentence, r'(#)(\.+)(#)', r'\2') sentence
from `project.dataset.table`,
unnest(split(trim(regexp_replace(regexp_replace(text, r'(\.+)', r'#\1#'), r'(\#\.\#)', r'####'), '####'), '####')) sentence
if applied to sample data in your question - output is