Search code examples
regexrex

Extract the User-Agent from HTTP request


I'm trying to get the User-Agent value from HTTP request and put in a separate field named "UserAgent" and so far not successful. Looks like I need to look up to carriage return and linefeed? Will appreciate any help.
Below is the regex101 link.

https://regex101.com/r/rdu8yE/1

POST /xxx/yyyy HTTP/1.1\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0\r\nAccept: application/json\r\nAccept-Language: en-US,en;q=0.5\r\nX-XSRF-TOKEN: 989g8ddfgdf7979df\r\ntimestamp: 2021-04-07T18:35:50.639Z\r\nContent-Type: application/json;charset=utf-8\r\nContent-Length: 340\r\nOrigin: https://example.com\r\nConnection: keep-alive\r\nReferer: https://my.example.com\r\n


Solution

  • Perhaps this? ^.*?User-Agent: (?<UserAgent>.*?)\\r\\n

    It is a little more than you need, choosing to start selection at the beginning of the string, consume but don't capture everything up to the signal for the user agent (here I use CRLFUser-Agent:), then capture everything up to but not including the next CRLF.

    Your original regex: User-Agent: (?<UserAgent>[^\\r\\n]*) included the 'n' as part of the class of symbols whose presence would stop capture (defined in the square braces) so you would only capture until partway through the word 'Windows'.

    Comment back if this is unclear :) .