I am looking for a C# regex solution to match/capture some small but complex chunks of data. I have thousands of unstructured chunks of data in my database (comes from a third-party data store) that look similar to this:
not BATTCOMPAR{275} and FORKCARRIA{ForkSpreader} and SIDESHIFT{WithSSPassAttachCenterLine} and TILTANGLE{4up_2down} and not AUTOMATSS{true} and not FORKLASGUI{true} and not FORKCAMSYS{true} and OKED{true}
I want to be able to split that up into discrete pieces (regex match/capture) like the following:
not BATTCOMPAR{275}
and FORKCARRIA{ForkSpreader}
and SIDESHIFT{WithSSPassAttachCenterLine}
and TILTANGLE{4up_2down}
and not AUTOMATSS{true}
and not FORKLASGUI{true}
and not FORKCAMSYS{true}
and OKED{true}
CONTAINER{Container}
The data will always conform to the following rules:
{275}
not
or and
or and not
or nothing. The "nothing" is the same as and
and will only occur when it's the first chunk in the string. For example, if my and OKED{true}
had come at the beginning of the string, the and
would have been omitted and OKED{true}
would have been prefixed by nothing (empty string). But it's the same as an and.and
or not
or and not
or nothing) there will always be a string designator that ends just before the curly brace grouping. Example: BATTCOMPAR
and not
), string designator (such as BATTCOMPAR
), and curly brace grouping (such as {ForkSpreader}
).I have experimented with a few different regex constructions:
Match curly brace groupings:
Regex regex = new Regex(@"{(.*?)}");
return regex.Matches(str);
The above almost works, but gets only the curly brace groupings and not the operator and string designator that goes with it.
Capture chunks based on string prefix, trying to match operator strings:
var capturedWords = new List<string>();
string regex = $@"(?<!\w){prefix}\w+";
foreach ( Match match in Regex.Matches(haystack, regex) ) {
capturedWords.Add(match.Value);
}
return capturedWords;
The above partially works, but gets only the operators, and not the entire chunk I need: (operator + string designator + curly brace grouping)
This works for me: /([and\s|or\s|not\s]+)?.*?(\{.*?\})/mg
on Regex Tester
.
On DotNet Fiddle
, this worked for me:
()
- Capture group
[and\\s|or\\s|not\\s]+?
- start with a single and, or, not or combination each followed by a whitespace
.*?
any combination of characters or none, ex. BATTCOMPAR
\\{.*?\\}
the final part enclosed in curly braces which contains any combination of characters or none
string test = "not BATTCOMPAR{275} and FORKCARRIA{ForkSpreader} and SIDESHIFT{WithSSPassAttachCenterLine} and TILTANGLE{4up_2down} and not AUTOMATSS{true} and not FORKLASGUI{true} and not FORKCAMSYS{true} and OKED{true}";
Regex r = new Regex("([and\\s|or\\s|not\\s]+?.*?\\{.*?\\})", RegexOptions.Multiline);
//or if you need to account for matches where there is no
//prepending words ie. and, not and
//Regex r = new Regex("([and\\s|or\\s|not\\s|]+?.*?\\{.*?\\}|.*?\\{.*?\\})", RegexOptions.Multiline);
MatchCollection matches = r.Matches(test);
foreach(Match m in matches)
{
Console.WriteLine(m.Value);
}
Prints:
//not BATTCOMPAR{275}
//and FORKCARRIA{ForkSpreader}
//and SIDESHIFT{WithSSPassAttachCenterLine}
//and TILTANGLE{4up_2down}
//and not AUTOMATSS{true}
//and not FORKLASGUI{true}
//and not FORKCAMSYS{true}
//and OKED{true}