I'm trying to parse some fields from a multi-line file, of which I'm only interested in some lines, while others I would like to skip. Here is an example of something similar to what I'm trying to do:
from pyparsing import *
string = "field1: 5\nfoo\nbar\nfield2: 42"
value1 = Word(nums)("value1")
value2 = Word(nums)("value2")
not_field2 = Regex(r"^(?!field2:).*$")
expression = "field1:" + value1 + LineEnd() + OneOrMore(not_field2)+ "field2:" + value2 + LineEnd()
tokens = expression.parseString(string)
print tokens["value1"]
print tokens["value2"]
where the Regex
for a line not starting with field2:
is adapted from Regular expression for a string that does not start with a sequence. However, running this example script gives a
pyparsing.ParseException: Expected Re:('^(?!field2:).*$') (at char 10), (line:2, col:1)
I would like the value2
to end up as 42
, regardless of the number of lines (foo\n
and bar\n
in this case). How can I achieve that?
The '^' and '$' characters in your Regex aren't interpreted on a line-by-line basis by pyparsing, but in the context of the whole string being parsed. So '^' will match only at the very beginning of the string and '$' only at the very end.
Instead you can do:
not_field2 = LineStart() + Regex(r"(?!field2:).*")