Search code examples
pythonpyparsing

In PyParsing, how to specify one or more lines which do not start with a certain string?


I'm trying to parse some fields from a multi-line file, of which I'm only interested in some lines, while others I would like to skip. Here is an example of something similar to what I'm trying to do:

from pyparsing import *

string = "field1: 5\nfoo\nbar\nfield2: 42"

value1 = Word(nums)("value1")
value2 = Word(nums)("value2")
not_field2 = Regex(r"^(?!field2:).*$")

expression = "field1:" + value1 + LineEnd() + OneOrMore(not_field2)+ "field2:" + value2 + LineEnd()

tokens = expression.parseString(string)

print tokens["value1"]
print tokens["value2"]

where the Regex for a line not starting with field2: is adapted from Regular expression for a string that does not start with a sequence. However, running this example script gives a

pyparsing.ParseException: Expected Re:('^(?!field2:).*$') (at char 10), (line:2, col:1)

I would like the value2 to end up as 42, regardless of the number of lines (foo\n and bar\n in this case). How can I achieve that?


Solution

  • The '^' and '$' characters in your Regex aren't interpreted on a line-by-line basis by pyparsing, but in the context of the whole string being parsed. So '^' will match only at the very beginning of the string and '$' only at the very end.

    Instead you can do:

    not_field2 = LineStart() + Regex(r"(?!field2:).*")