I need to admit, I'm very basic if it comes to RegEx expressions. I have an app written in C# that looks for certain Regex expressions in text files. I'm not sure how to explain my problem so I will go straight to example.
My text:
DeviceNr : 30
DeviceClass = ABC
UnitNr = 1
Reference = 29
PhysState = ENABLED
LogState = OPERATIVE
DevicePlan = 702
Manufacturer = CDE
Model = EFG
ready
DeviceNr : 31
DeviceClass = ABC
UnitNr = 9
Reference = 33
PhysState = ENABLED
LogState = OPERATIVE
Manufacturer = DDD
Model = XYZ
Description = something here
ready
I need to match a multiline text that starts with "DeviceNr" word, ends with "ready" and have "DeviceClass = ABC" and "Model = XYZ" - I can only assume that this lines will be in this exact order, but I cannot assume what will be between them, not even number of other lines between them. I tried with below regex, but it matched the whole text instead of only DeviceNr : 31
DeviceNr : ([0-9]+)(?:.*?\n)*? DeviceClass = ABC(?:.*?\n)*? Model = XYZ(?:.*?\n)*?ready\n\n
If you know that "DeviceClass = ABC" and "Model = XYZ"
are present and in that order, you can also make use of a lookahead assertion on a per line bases first matching all lines that do not contain for example DeviceNr
Then match the lines that does, and also do this for Model
and ready
^\s*DeviceNr : ([0-9]+)(?:\r?\n(?!\s*DeviceClass =).*)*\r?\n\s*DeviceClass = ABC\b(?:\r?\n(?!\s*Model =).*)*\r?\n\s*Model = XYZ\b(?:\r?\n(?!\s*ready).*)*\r?\n\s*ready\b
^
Start of string\s*DeviceNr : ([0-9]+)
Match DeviceNr :
and capture 1+ digits 0-9 in group 1(?:
Non capture group
\r?\n(?!\s*DeviceClass =).*
Match a newline, and assert that the line does not contain DeviceClass =
)*
Close non capture group and optionally repeat as you don't know how much lines there are\r?\n\s*DeviceClass = ABC\b
Match a newline, optional whitespace chars and DeviceClass = ABC
(?:\r?\n(?!\s*Model =).*)*\r?\n\s*Model = XYZ\b
The previous approach also for Model =
(?:\r?\n(?!\s*ready).*)*\r?\n\s*ready\b
And the same approach for ready
Note that \s
can also match a newline. If you want to prevent that, you can also use [^\S\r\n]
to match a whitespace char without a newline.