I have participant survey data that contains, for each variable: the variable name, its value in that observation, and the conditions required for that question to have been asked (if earlier answers establish that a question is not applicable, the participant won't be prompted). One of my tasks is to differentiate the blanks that mean N/A from the blanks that represent an asked-but-unanswered prompt. Unfortunately, the export function in our data capture tool doesn't offer this feature.
To get around this, I'm comparing the conditions for each variable's branching to the recorded observation and seeing whether the prompt should have displayed. That is probably confusing, so as an example imagine a record for Subject A:
Variable Name | Observation Value | Branching Logic
foo | 5 |
bar | 2 | foo != 2
baz | 7 | foo < 10 or bar == 5
The prompt for foo
shows up no matter what; the prompt for bar
will show because foo = 5
satisfies its condition foo != 2
, and similarly baz
will be observed. I'm treating it as a pandas dataframe, so I'm using a dict to represent the test data while I build a toy version of the module. I almost have it working but am missing one piece: nested parentheses.
There are a lot of similar questions (e.g. pyparsing and line breaks) and I found a very similar example in the PyParsing documentation that handles logical notation, but I'm not great at python and had trouble following the use of multiple classes, child classes, etc. I was able to use that as a jumping off point for the following:
import pyparsing as pp
test_data = {
'a' : 3,
'b' : 6,
'c' : 2,
'd' : 4
}
# Functions applied by parser
def toInt(x):
return [int(k) for k in x]
def useKey(x):
try: return [test_data[k] for k in x]
except KeyError: print("Value not a key:", x)
def checkCond(parsed):
allinone = parsed[0]
print("Condition:", allinone)
humpty = " ".join([str(x) for x in allinone])
return eval(humpty)
# Building the parser
key = pp.Word(pp.alphanums + '_')('key')
op = pp.oneOf('> >= == != <= <')('op')
val = pp.Word(pp.nums + '-')('value')
joint = pp.oneOf("and or")
key.setParseAction(useKey)
val.setParseAction(toInt)
cond = pp.Group(key + op + val)('condition')
cond.addParseAction(checkCond)
logic = cond + pp.Optional(joint) + pp.Optional(cond)
# Tests
if __name__ == "__main__":
tests = [
("a == 5", False),
("b < 3", False),
("c > 1", True),
("d != 2", True),
("a >= 1", True),
("b <= 5", False),
("a <= 6 and b == 2", False),
("a <= 6 or b == 2", True)]
#("b > 2 and (a == 3 or d > 2 or c < 1)", True)]
for expr, res in tests:
print(expr)
out = logic.parseString(expr)
out = " ".join([str(x) for x in out])
out = bool(eval(out))
if bool(out) == bool(res):
print("PASS\n")
else: print("FAIL\n",
"Got:", bool(out),
"\nExpected:",bool(res), "\n")
After a lot of trial and error I'm getting the results I expected out of this. Notice that the last test is commented out, though; if you uncomment that and run it, you get:
b > 2 and (a == 3 or d > 2 or c < 1)
Condition: [6, '>', 2]
Traceback (most recent call last):
File "testdat/pptutorial.py", line 191, in <module>
out = bool(eval(out))
File "<string>", line 1
True and
^
SyntaxError: unexpected EOF while parsing
I'm sure it's something very silly that I'm missing but for the life of me I cannot figure this piece out. It seems like the parentheses make the parser think it's the start of a new statement. There are other answers that suggest looking for empty values, printing out the individual tokens, etc., but I haven't had any luck that way. My guess is it's something with how I've set up the groups in the parser. I've never built one before so this is definitely uncharted territory for me! Thanks a ton for any help and let me know if there's more information I can provide.
No part of your grammar allows parentheses in the input, which is why pyparsing stops parsing once it encounters a parenthesis.
You can allow parentheses around conditions with a small tweak to your definition of logic
:
cond_chain_with_parentheses = pp.Forward()
cond_chain = cond + pp.Optional(joint + cond_chain_with_parentheses)
cond_chain_with_parentheses <<= cond_chain | '(' + cond_chain + ')'
logic = cond_chain_with_parentheses + pp.StringEnd()
Here, I've used a forward declaration of cond_chain_with_parentheses
, which allows me to use it in the grammar definition even though it's not defined yet. I've also added StringEnd
so that an exception is thrown if not the entire input can be parsed.
This grammar can parse all of your inputs correctly:
>>> logic.parseString("b > 2 and (a == 3 or d > 2 or c < 1)")
Condition: [6, '>', 2]
Condition: [3, '==', 3]
Condition: [4, '>', 2]
Condition: [2, '<', 1]
([True, 'and', '(', True, 'or', True, 'or', False, ')'], {'condition': [True, True, True, False]})