Search code examples
regexurlgrok

Regexp: replace all digits on URL after third slash


How can I replace all digits in URL after third slash, on characters #### with regexp? In this case, the number of # must correspond to the number of replaced digits. Numbers can be in more than one slash section. Also, the location of the digits is not fixed, but exactly after the third slash

Examples:

/path/to/something/1234/end
/path/to/something/12/1234/end

To:

/path/to/something/####/end
/path/to/something/##/####/end

I tried to use an expression, but it does not give the desired result:

"(?<=/)\\d+(?=/|$), #####"

This regexp is needed to implement the grok pattern in Logstash (gsub function).

P.s. Why after third slash? Because because the numbers can be at the beginning, but they do not need to be changed (/path/to_1/something/1234/end)


Solution

  • You can use

    (?:\G(?!^)|^((?:/[^/]*){3}/))(\D*)\d
    

    as regex and $1$2# as replacement.

    See the regex demo.

    Details:

    • (?:\G(?!^)|^((?:/[^/]*){3}/)) - end of the previous match (\G(?!^)) or (|) start of string + three occurrences of / and then zero or more non-slash shars and then a slash char captured into Group 1 (^((?:/[^/]*){3}/))
    • (\D*) - Group 2: any zero or more non-digits
    • \d - a digit

    The replacement is a concatenation of Group 1 + Group 2 values and a # char.