Search code examples
regexdynatrace

Regex to remove the specific word from URL


In Dynatrace, there are the URLs which are containing the a word which is dynamic. Want to remove that dynamic word from the URL using regex

Below are the different urls

  • /aaa/fdsadx/drtyu/ab_cd/myword?Id=953
  • /asd/XXXXX/sadsa/two/xx_yy?Id=953
  • /asd/fdsadx/df/three/pp_qq/myword
  • /asd/fdsadx/sadsa/ab_cd
  • /SSS/fdsadx/cvnm/forth/gg_hh

Expected output

  • /asd/fdsadx/sadsa//myword?Id=953
  • /asd/fdsadx/sadsa/?Id=953
  • /asd/fdsadx/sadsa//myword
  • /asd/fdsadx/sadsa/

I'm able to manage this regex

(\S+?)ab_cd(.*)

But its not working for dynamics values and all URL. How Can I improve the regex to to remove the dynamic value?


Solution

  • You could use the 2 capturing groups and match the underscore part after matching a forward slash

    ^(\S+/)[^\s_]+_[^\s_/?]+(.*)
    
    • ^ Start of string
    • (\S+/) Capture group 1, match 1+ times a non whitespace char followed by /
    • [^\s_]+ Match 1+ times any char except a whitespace char or _
    • _ Match literally
    • [^\s_/?]+ Match 1+ times any char except a whitespace char, _, / or ?
    • (.*) Capture group 2 Match 0+ times any char except a newline

    Regex demo

    In the replacement use the 2 capturing groups, for example $1$2

    If you want to match country codes and you know that they for example consist of chars a-zA-Z you could make the character class more specific

    ^(\S+/)[A-Za-z]+_[A-Za-z]+(.*)
    

    Regex demo