I'm using the following regex;
((?:_missing_:|_exists_:)[a-z0-9]+)|(([a-z0-9]+)(?=:))
To match a lucene query string;
_missing_:title age:(>=10 AND < 20) AND age:123 AND _exists_:title123
The first non-capture group is not respected and returns _missing_:title
and not title
. Using a positive lookahead makes the entire regex fail to match anything.
It should return the following array;
['title', 'age', 'age', 'title123']
Change your regex like below and then grab the strings you want from group index 1 and 2.
(?:_missing_:|_exists_:)([a-z1-9]+)|([a-z1-9]+)(?=:)
You don't need to include the non-capturing group (?:_missing_:|_exists_:)
inside a capturing group. This is the reason for returning missing:title
instead of title
. And also Capturing group for [a-z1-9]+
would be enough.