Search code examples
regexsedsplunksplunk-query

Use Sed to replace numbers in URL within Splunk


How can I extract alpha-numeric values within a URL? I have the following query that isn't replacing the right values.

Example Input Data:

/example/endpoint/here/34456dwf45
/endpoint/fddk449372
/434236/example/endpoint

Expected Output:

/example/endpoint/here/my_var
/endpoint/my_var
/my_var/example/endpoint

Current Query:

* | rex mode=sed field=request_url "s/(.*\\/)[^\/]+(\/.*)/\1my_var\2/" 
  | stats values(request_url)

How can I use sed to replace any alpha-numeric values within two / characters to be a string inside a URL?


Solution

  • You may use the following sed command:

    "s,(^|/)[[:alpha:]]*[[:digit:]][[:alnum:]]*($|/),\1my_var\2,"
    

    Or, to replace overlapping matches and all occurrences use (?![^/]) instead of ($|/) and add g flag at the end:

    "s,(^|/)[[:alpha:]]*[[:digit:]][[:alnum:]]*(?![^/]),\1my_var,g"
    

    See the first regex demo and the second regex demo.

    s here means we need to replace strings. The delimiters are , (commas) as this way we do not have to escape forward slashes.

    The (^|/)[[:alpha:]]*[[:digit:]][[:alnum:]]*($|/) pattern matches

    • (^|/) - Group 1 (\1): start of aline or /
    • [[:alpha:]]*[[:digit:]][[:alnum:]]* - 0+ letters, a digit and then 0 or more digits or letters
    • ($|/) - Group 2 (\2): end of a line or /
    • (?![^/]) - is a negative lookahead that matches a location that is not immediately followed with any char but /.