I am working on a big log file whose entries are as follow:
-- "GET <b>/fss-w3-mtpage.php</b> HTTP/1.1" 200 0.084 41 "-" "c110bc/1.0" 127.0.0.1:25001 0.084
-- "GET <b>/m/firstpage/Services/getAll</b>?ids=ABCVDFDS,ASDASBDB,ASDBSA&requestId=091fa2b4-643e-4473-b6d8-40210b775dcf HTTP/1.1" 200
-- POST <b>/lastpage/Services/getAll</b>?ids=ABCVDFDS,ASDASBDB,ASDBSA&requestId=091fa2b4-643e-4473-b6d8-40210b775dcf HTTP/1.1" 200
And I wanted to extract the part that is bolded out in above sample. Here is the regex that I wrote for the above
.*(POST|GET)\s+(([^\?]+)|([^\s]))
I want to get the part that is after GET
or POST
and until the first occurrence of a space ' '
or a question mark '?'
.
Problem
The logical OR in the later part of the regex is not working.
If I use only
.*(POST|GET)\s+([^\?]+)
I am getting the correct portion i.e. from GET or POST until the first question mark '?'
. Similarly if I use
.*(POST|GET)\s+([^\s]+)
I am getting the correct portion i.e. from GET or POST until the first space ' '
).
Please can anyone tell me where I am wrong?
Get the matched group from index 2
\b(POST|GET)\s+([^?\s]+)
Here is DEMO
Pattern explanation:
\b the word boundary
( group and capture to \1:
POST 'POST'
| OR
GET 'GET'
) end of \1
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times)
( group and capture to \2:
[^?\s]+ any character except: '?', whitespace
(\n, \r, \t, \f, and " ") (1 or more times)
) end of \2