>>> from pyparsing import Word, alphanums, OneOrMore, Optional, Suppress
>>> var = Word(alphanums)
>>> reg = OneOrMore(var('predictors') + Optional(Suppress('+'))) + '~' + OneOrMore(var('covariates') + Optional(Suppress('+')))
>>> string = 'y1 ~ f1 + f2 + f3'
>>> reg.parseString(string)
(['y1', '~', 'f1', 'f2', 'f3'], {'predictors': ['y1'], 'covariates': ['f1', 'f2', 'f3']})
It is able to parse things correctly but I am unable to get all the values of predictors
and covariates
. It only seems to store the last value:
>>> results = reg.parseString(string)
>>> results.covariates
'f3'
>>> results['covariates']
'f3'
I would like to get all the values in predictors
and covariates
as lists . Any ideas why this is happening?
Results names by default use similar logic as Python dicts: if there are multiple values assigned for the same key, only the last assigned value is kept.
However, this behavior can be overridden, depending how the parser defines the results names.
If using the full expr.setResultsName("XYZ")
form, add listAllMatches=True
argument. This tells pyparsing to keep a list of all parsed values and return them as a list.
If using the short-cut expr("XYZ")
form, add a '*'
to the end of the name: expr("XYZ*")
. This is equivalent to setting listAllMatches
to True.
The trailing '*'
is there in setResultsName
for those cases where you use the short form of setResultsName
: expr("name*")
vs expr.setResultsName("name", listAllMatches=True)
. If you prefer calling setResultsName
, then do not use the '*'
notation, but instead pass the listAllMatches
argument.