sql regex google-bigquery regexp-replace

REGEXP_REPLACE URL BIGQUERY

I have two types of URL's which I would need to clean, they look like this:

["//xxx.com/se/something?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
["//www.xxx.com/se/car?p_color_car=White?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]

The outcome I want is; SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"

I want to remove the brackets and everything up to SE, the URLS differ so I want to remove:

First URL
["//xxx.com/se/something?

Second URL:
["//www.xxx.com/se/car?p_color_car=White?

I can't get my head around it,I've tried this .*\/ .
But it will still keep strings I don't want such as:
(1 url) = something?

(2 url) car?p_color_car=White?

Solution

You can use

regexp_replace(FinalUrls, r'.*\?|"\]$', '')

See the regex demo

Details

.*\? - any zero or more chars other than line breakchars, as many as possible and then ? char
| - or
"\]$ - a "] substring at the end of the string.

Mind the regexp_replace syntax, you can't omit the replacement argument, see reference:

REGEXP_REPLACE(value, regexp, replacement)

Returns a STRING where all substrings of value that match regular expression regexp are replaced with replacement.

You can use backslashed-escaped digits (\1 to \9) within the replacement argument to insert text matching the corresponding parenthesized group in the regexp pattern. Use \0 to refer to the entire matching text.