Search code examples
pythonregexparsingnlpbooleanquery

Manipulate Nested Boolean Query String in Python


I have string boolean queries like this

   queryString= """And(
                      OR(abc,xyz,wxy),
                      AND(AND(xyz,wxy),xzy),
                      XOR(x1,y1, AND(xy,zz))  
                      )"""

At current it is hard for me to modify the above query string, as I want to

  1. Add another OR(x3,y3) in the last XOR
  2. Remove entire OR(abc,xyz,wxy)

with desired output

   resultQueryString= """And(                        
                            AND(AND(xyz,wxy),xzy),
                            XOR(x1,y1, AND(xy,zz),OR(x3,y3))  
                            )"""

I think I cannot easily do it unless I come up with a sophisticated regex for each different query.

I am trying to write a python function which would take above string boolean query as input and output a tree data structure.

So that I can traverse the tree and evaluate or change whatever portion of query I want to change.

In above example, if I had it as a tree, I can easily see the root is AND and traverse/modify other branches so on.


Solution

  • The ast.parse function seems to do almost exactly what you want:

    ast.dump(ast.parse("""And(                        
                            AND(AND(xyz,wxy),xzy),
                            XOR(x1,y1, AND(xy,zz),OR(x3,y3))  
                            )""").body[0].value)
    

    Call(func=Name(id='And', ctx=Load()), args=[Call(func=Name(id='AND', ctx=Load()), args=[Call(func=Name(id='AND', ctx=Load()), args=[Name(id='xyz', ctx=Load()), Name(id='wxy', ctx=Load())], keywords=[], starargs=None, kwargs=None), Name(id='xzy', ctx=Load())], keywords=[], starargs=None, kwargs=None), Call(func=Name(id='XOR', ctx=Load()), args=[Name(id='x1', ctx=Load()), Name(id='y1', ctx=Load()), Call(func=Name(id='AND', ctx=Load()), args=[Name(id='xy', ctx=Load()), Name(id='zz', ctx=Load())], keywords=[], starargs=None, kwargs=None), Call(func=Name(id='OR', ctx=Load()), args=[Name(id='x3', ctx=Load()), Name(id='y3', ctx=Load())], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)

    (the .body[0].value removes two pointless layers of abstraction, and the .dump is just for output.

    Here is the code that then does the transformations you requested on the output:

    class Filterer(ast.NodeTransformer):
        def visit_Call(self, node):
                name=node.func.id
                if name == "OR" and len(node.args) == 3:
                        return None
                elif name == "XOR":
                        args = [ast.Name("x3",ast.Load()),
                                ast.Name("y3",ast.Load())]
                        func = ast.Name("OR",ast.Load())
                        node.args.append(ast.Call(func, args, [], None, None))
                return self.generic_visit(node)
    

    and here is the code that prints the result in your format with the exception of whitespace: (Python doesn't have a builtin in its ast module for this):

    class Printer(ast.NodeVisitor):
        def visit_Call(self, node):
                self.visit(node.func)
                print("(",end="")
                comma = False
                for arg in node.args:
                        if comma:
                                print(",",end="")
                        comma=True
                        self.visit(arg)
                print(")",end="")
        def visit_Name(self, node):
                print(node.id,end="")
    

    Thus, the final code would be:

    Printer().visit(Filterer().visit(ast.parse(queryString)))