Search code examples
regexregex-alternation

Logical OR not working in regular expression


I am working on a big log file whose entries are as follow:

-- "GET <b>/fss-w3-mtpage.php</b> HTTP/1.1" 200 0.084 41 "-" "c110bc/1.0" 127.0.0.1:25001  0.084

-- "GET <b>/m/firstpage/Services/getAll</b>?ids=ABCVDFDS,ASDASBDB,ASDBSA&requestId=091fa2b4-643e-4473-b6d8-40210b775dcf HTTP/1.1" 200

-- POST <b>/lastpage/Services/getAll</b>?ids=ABCVDFDS,ASDASBDB,ASDBSA&requestId=091fa2b4-643e-4473-b6d8-40210b775dcf HTTP/1.1" 200

And I wanted to extract the part that is bolded out in above sample. Here is the regex that I wrote for the above

.*(POST|GET)\s+(([^\?]+)|([^\s])) 

I want to get the part that is after GET or POST and until the first occurrence of a space ' ' or a question mark '?'.

Problem
The logical OR in the later part of the regex is not working. If I use only

.*(POST|GET)\s+([^\?]+)    

I am getting the correct portion i.e. from GET or POST until the first question mark '?'. Similarly if I use

.*(POST|GET)\s+([^\s]+)    

I am getting the correct portion i.e. from GET or POST until the first space ' ').

Please can anyone tell me where I am wrong?


Solution

  • Get the matched group from index 2

    \b(POST|GET)\s+([^?\s]+)
    

    Here is DEMO

    Pattern explanation:

      \b                       the word boundary
    
      (                        group and capture to \1:
        POST                     'POST'
       |                        OR
        GET                      'GET'
      )                        end of \1
    
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or more times)
    
      (                        group and capture to \2:
    
        [^?\s]+                  any character except: '?', whitespace
                                 (\n, \r, \t, \f, and " ") (1 or more times)
    
      )                        end of \2