Search code examples
regexpcre

Regex Capture one word OR two words in quotes


I'm trying to implement gmail style filters in my search and I'm stuck at this regex problem. I need to capture ONE word OR two words in quotes (but without the quotation marks themselves) This is PCRE (PHP)

ie.

name:mark

desired result: 1st capture group should be mark

name:"mark"

desired result: 1st capture group should be mark

name:"mark wilson"

desired result: 1st capture group should be mark, second capture group should be wilson

name:mark wilson

desired result: 1st capture group should be mark, wilson is ignored

The closest I've gotten is name:(\w+|\"\w+(?>\"|\s([a-z.'-]+\"))) it captures example 1 perfectly, but example 2 still includes the quotes, and example 3 ends up as:

group 1: "mark wilson" (quotes included)

group 2: wilson" (quote included)

I've tried lookahead and lookbehinds but I'm not getting anywhere with those either

any help would be very appreciated. tia


Solution

  • 1 option could be using an if/else clause which will give mark in group 2 and wilson in group 3. The first group will capture the " which can be used for the if else checking for the existence for group 1.

    \w+:(")?(\w+(?:\h+(\w+))?)(?(1)")
    

    Regex demo

    If the space after the first name should not be there, you could also group that and have the values in group 3 and 4

    \w+:(")?((\w+)(?:\h+(\w+))?)(?(1)")
    

    Regex demo

    You could also get either the single value between quotes or not, or capture the first or second name in a capturing group using a branch reset group


    \w+:(?|"(\w+)(?:\h+(\w+))?"|(\w+))
    

    Explanation

    • \w+: Match 1+ word chars
    • (?| Branch reset group
      • "(\w+) Capture group 1, match 1+ word chars
      • (?: Non capture group
        • \h+ match 1+ horizontal whitespace chars
        • (\w+) Capture group 2, match 1+ word chars
      • )? Close group and make optional
      • " Match "
      • | Or
      • (\w+) Capture group 1, match 1+ word chars
    • ) Close branch reset group

    Regex demo