I have two types of URL's which I would need to clean, they look like this:
["//xxx.com/se/something?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
["//www.xxx.com/se/car?p_color_car=White?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
The outcome I want is;
SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"
First URL
["//xxx.com/se/something?
Second URL:
["//www.xxx.com/se/car?p_color_car=White?
I can't get my head around it,I've tried this .*\/
.
But it will still keep strings I don't want such as:
(1 url) =
something?
(2 url) car?p_color_car=White?
You can use
regexp_replace(FinalUrls, r'.*\?|"\]$', '')
See the regex demo
Details
.*\?
- any zero or more chars other than line breakchars, as many as possible and then ?
char|
- or"\]$
- a "]
substring at the end of the string.Mind the regexp_replace
syntax, you can't omit the replacement argument, see reference:
REGEXP_REPLACE(value, regexp, replacement)
Returns a
STRING
where all substrings ofvalue
that match regular expressionregexp
are replaced withreplacement
.You can use backslashed-escaped digits (\1 to \9) within the
replacement
argument to insert text matching the corresponding parenthesized group in theregexp
pattern. Use \0 to refer to the entire matching text.