Search code examples
regexgoogle-bigqueryre2

How do you turn on flags for a capturing subgroup using re2 regular expressions with BigQuery


This question is specific to the re2 regular expression library by Google. There are other solution for this problem in other engines, but they dont apply to re2

I wish to use flags such as i (case insensitive) and s (dot matches new line) in BigQuery using REGEXP_REPLACE, with the condition that its use does not make the group a non-capturing group as mandated by re2.

For the inputs, My name is X, my Name is Y, MY NAME is Z, i wish to replace X,Y,Z with some constant C, such that the output is for ex. My name is C

I know that you can use these flags in the following way i.e (?is:My name is )(\w+). , however the first group becomes a non-capturing group, that doesnt allow me to refer to it via a number such as \1 like this (since \1 refers to X, instead of my name is)

REGEXP_REPLACE(str, r'(?is:My name is )(\w+). ', r'\1 C')

How can i use flags with capturing groups?


Solution

  • Use a nested capturing group within the non-capturing group like this

    (?is:(My name is ))(\w+)

    REGEXP_REPLACE(str, r'(?is:(My name is ))(\w+). ', r'\1 C')