So i have two sentences that I'm working with and I'm interested in making specific capture groups based on the characters in a word. So i have these two spanish sentences:
The first capture group has to be one of the verbs ie. "quiero" and "puedo" so i do that with this regex ([PpDdQq].*o)
.
The second capture group has to be a word following directly after the verb, ending in "me" and I do that with (\w*me)
.
Now for the last capture group,it has to be all words and blankspaces following directly after the first capture group in the absence of a direct word ending in "-me" or all words and blankspaces following directly after the second capture group in the presence of a direct word ending in "-me", I used (\w.+)
but it didn't work.
Could anybody help me figure out why? Thanks. Below is the full regex and link to regex website containing the expression and examples to be matched:
([PpDdQq].*o) |(\w*me)|(\w.+)
Use
\b([PpDdQq]\w*o)(?:\s+(\w*me))?\b(.*)
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[PpDdQq] any character of: 'P', 'p', 'D', 'd',
'Q', 'q'
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
o 'o'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0
or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
me 'me'
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \3