I'm currently trying to extract the following from patterns like @Apple(kind="Bax", priority=33)
What I currently use is @([^(]*)\(([^\)]*)\)
. Then I have Apple
and kind="Bax", priority=33
. After this, I split group 2 on ,
, then split on =
and finally remove "
if any at start or end.
Now this will traverse the second segment a lot. First for the regex capture, then to find all ,
, then for each traverse again to find =
etc etc.
Since I do this millions of times, is there any way to capture it withing the regex traversal? I'd like to avoid all the splits.
Assuming you want to allow the count of key=value pairs to be arbitrary length, how about:
(?:@|\(|,\s*|="?)(\w+)(?=\(|=|"|,|\))
All captures are in Group 1.
(?:@|\(|,\s*|="?)
matches one of @
, (
, ,
plus whitespaces more than 0,
or =
plus an optional "
.(\w+)
matches the desired word and is captured in Group1.(?=\(|=|"|,|\))
is a positive lookahead assertion to match one of (
,
=
, "
, ,
or )
.[Edit]
If the element enclosed by the double quotes may contain a comma, It will not be easy to parse it with a single regex. If possible, it will be less maintenable. I would divide the operation in two steps. Suppose we have a string:
@Apple(val="a,b", kind="Bax", priority=33,foo=bar, name="John Doe", lorem=ipsum)
Then with the 1st regex:
^@([^(]+)\(([^)]+)\)
Apple
is captured in Group 1 and the substring enclosed in the parentheses is
captured in Group 2.
Then apply the next regex to the Group 2:
(?<=")[^"=]+(?=")|[^,=" ]+
Now we can obtain the list:
['val', 'a,b', 'kind', 'Bax', 'priority', '33', 'foo', 'bar', 'name', 'John Doe', 'lorem', 'ipsum']