I'm looking at this library, which has little documentation: https://pythonhosted.org/parsec/#examples
I understand there are alternatives, but I'd like to use this library.
I have the following string I'd like to parse:
mystr = """
<kv>
key1: "string"
key2: 1.00005
key3: [1,2,3]
</kv>
<csv>
date,windspeed,direction
20190805,22,NNW
20190805,23,NW
20190805,20,NE
</csv>"""
While I'd like to parse the whole thing, I'd settle for just grabbing the <tags>
. I have:
>>> import parsec
>>> tag_start = parsec.Parser(lambda x: x == "<")
>>> tag_end = parsec.Parser(lambda x: x == ">")
>>> tag_name = parsec.Parser(parsec.Parser.compose(parsec.many1, parsec.letter))
>>> tag_open = parsec.Parser(parsec.Parser.joint(tag_start, tag_name, tag_end))
OK, looks good. Now to use it:
>>> tag_open.parse(mystr)
Traceback (most recent call last):
...
TypeError: <lambda>() takes 1 positional argument but 2 were given
This fails. I'm afraid I don't even understand what it meant about my lambda expression giving two arguments, it's clearly 1. How can I proceed?
My optimal desired output for all the bonus points is:
[
{"type": "tag",
"name" : "kv",
"values" : [
{"key1" : "string"},
{"key2" : 1.00005},
{"key3" : [1,2,3]}
]
},
{"type" : "tag",
"name" : "csv",
"values" : [
{"date" : 20190805, "windspeed" : 22, "direction": "NNW"}
{"date" : 20190805, "windspeed" : 23, "direction": "NW"}
{"date" : 20190805, "windspeed" : 20, "direction": "NE"}
]
}
The output I'd settle for understanding in this question is using functions like those described above for start and end tags to generate:
[
{"tag": "kv"},
{"tag" : "csv"}
]
And simply be able to parse arbitrary xml-like tags out of the messy mixed text entry.
I encourage you to define your own parser using those combinators, rather than construct the Parser
directly.
If you want to construct a Parser
by wrapping a function, as the documentation states, the fn
should accept two arguments, the first is the text and the second is the current position. And fn
should return a Value
by Value.success
or Value.failure
, rather than a boolean. You can grep @Parser
in the parsec/__init__.py
in this package to find more examples of how it works.
For your case in the description, you could define the parser as follows:
from parsec import *
spaces = regex(r'\s*', re.MULTILINE)
name = regex(r'[_a-zA-Z][_a-zA-Z0-9]*')
tag_start = spaces >> string('<') >> name << string('>') << spaces
tag_stop = spaces >> string('</') >> name << string('>') << spaces
@generate
def header_kv():
key = yield spaces >> name << spaces
yield string(':')
value = yield spaces >> regex('[^\n]+')
return {key: value}
@generate
def header():
tag_name = yield tag_start
values = yield sepBy(header_kv, string('\n'))
tag_name_end = yield tag_stop
assert tag_name == tag_name_end
return {
'type': 'tag',
'name': tag_name,
'values': values
}
@generate
def body():
tag_name = yield tag_start
values = yield sepBy(sepBy1(regex(r'[^\n<,]+'), string(',')), string('\n'))
tag_name_end = yield tag_stop
assert tag_name == tag_name_end
return {
'type': 'tag',
'name': tag_name,
'values': values
}
parser = header + body
If you run parser.parse(mystr)
, it yields
({'type': 'tag',
'name': 'kv',
'values': [{'key1': '"string"'},
{'key2': '1.00005'},
{'key3': '[1,2,3]'}]},
{'type': 'tag',
'name': 'csv',
'values': [['date', 'windspeed', 'direction'],
['20190805', '22', 'NNW'],
['20190805', '23', 'NW'],
['20190805', '20', 'NE']]}
)
You can refine the definition of values
in the above code to get the result in the exact form you want.