Search code examples
sqlregexgoogle-bigqueryregexp-replace

REGEXP_REPLACE URL BIGQUERY


I have two types of URL's which I would need to clean, they look like this:

["//xxx.com/se/something?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
["//www.xxx.com/se/car?p_color_car=White?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]

The outcome I want is; SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"


I want to remove the brackets and everything up to SE, the URLS differ so I want to remove:

First URL
["//xxx.com/se/something?

Second URL:
["//www.xxx.com/se/car?p_color_car=White?


I can't get my head around it,I've tried this .*\/ .
But it will still keep strings I don't want such as:
(1 url) = something?

(2 url) car?p_color_car=White?


Solution

  • You can use

    regexp_replace(FinalUrls, r'.*\?|"\]$', '')
    

    See the regex demo

    Details

    • .*\? - any zero or more chars other than line breakchars, as many as possible and then ? char
    • | - or
    • "\]$ - a "] substring at the end of the string.

    Mind the regexp_replace syntax, you can't omit the replacement argument, see reference:

    REGEXP_REPLACE(value, regexp, replacement)

    Returns a STRING where all substrings of value that match regular expression regexp are replaced with replacement.

    You can use backslashed-escaped digits (\1 to \9) within the replacement argument to insert text matching the corresponding parenthesized group in the regexp pattern. Use \0 to refer to the entire matching text.