Search code examples
pythonpython-2.7cythonpyparsingcythonize

Cythonized pyparser not working properly, getting wrong argument count to functions


I have a python project to parse some assembler code

asm_parser/
  - asm.py
  - AST.py
  - obj_code.py
  ...

Below grammar I have set this parse action class on successful matching (init function gets the tokens)

self.dir_map_code_fp = pp.OneOrMore(...).setParseAction(Body)

In AST.py the function Body.__init__() tokens are receiving

class Body(Node):
    def __init__(self, tokens):
        super(Body,self).__init__()
        self.code = tokens

Then I call parseString() on the grammar using the input file string

self.parser_asm.parseString(string, parseAll=True)

To hide the source I am converting these python files to .so files using cythonize. Below is the setup.py file that I am using to create .so files

class MyBuildExt(build_ext):
    def run(self):
        build_ext.run(self)
        build_dir = Path(self.build_lib)
        root_dir = Path(__file__).parent
        target_dir = build_dir if not self.inplace else root_dir
        self.copy_file(Path('assembler') / '__init__.py', root_dir, target_dir)
        self.copy_file(Path('assembler') / '__main__.py', root_dir, target_dir)

    def copy_file(self, path, source_dir, destination_dir):
        if not (source_dir / path).exists():
            return
        shutil.copyfile(str(source_dir / path), str(destination_dir / path))

if __name__ == '__main__':
    ext_modules = [
        Extension(...) for f in files
    ]

    setup(
        name="myasm",
        ext_modules=cythonize(ext_modules, nthreads=8),
        cmdclass=dict(build_ext=MyBuildExt),
        packages=["asm"]
    )

After creating the so files I created a run_asm.py file to run the asm code as a wrapper. I import all the so file modules to this run_asm.py

import argparse
from asm import Preprocessor

if __name__ == "__main__":
    argParser = argparse.ArgumentParser(description='Assembler')
    argParser.add_argument('-asm', '--asm', required=True, help="Assembly file")
    argParser.add_argument('-outdir', '--outdir', required=False, default='.', help="default_img directory")
    args = argParser.parse_args()
    prep = Preprocessor()

In pure python form project is working. In cythonized .so form Argparsing, file reading all things are working until parseAction() call to the Body.__init__() function. init function only takes two and here it is given four

Traceback (most recent call last):
  File "run_asm.py", line 30, in <module>
    prep.generate_ast(f, args.outdir)
  File "pkg/asm.py", line 145, in pkg.assembler.Preprocessor.generate_ast
  File "/u/nalaka/intelpython2/lib/python2.7/site-packages/pyparsing.py", line 1206, in parseString
    loc, tokens = self._parse( instring, 0 )
  File "/u/nalaka/intelpython2/lib/python2.7/site-packages/pyparsing.py", line 1072, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "/u/nalaka/intelpython2/lib/python2.7/site-packages/pyparsing.py", line 2923, in parseImpl
    loc, tokens = self_expr_parse( instring, loc, doActions, callPreParse=False )
  File "/u/nalaka/intelpython2/lib/python2.7/site-packages/pyparsing.py", line 1072, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "/u/nalaka/intelpython2/lib/python2.7/site-packages/pyparsing.py", line 2607, in parseImpl
    return e._parse( instring, loc, doActions )
  File "/u/nalaka/intelpython2/lib/python2.7/site-packages/pyparsing.py", line 1098, in _parseNoCache
    tokens = fn( instring, tokensStart, retTokens )
  File "/u/nalaka/intelpython2/lib/python2.7/site-packages/pyparsing.py", line 819, in wrapper
    ret = func(*args[limit[0]:])
  File "pkg/AST.py", line 28, in pkg.AST.Body.__init__
TypeError: __init__() takes exactly 2 positional arguments (4 given)

I looked at pyparsing.py code, below func is the Body.__init__() function. In pure python version limit[0] = 2 but cythonized version limit[0] = 0 so the argument count is changed in the two versions. I couldn't get more information on this.

def wrapper(*args):
    while 1:
        try:
            ret = func(*args[limit[0]:])
            foundArity[0] = True
            return ret

Also I found parseAction() is callable method with 0-3 arguments C{fn(s,loc,toks)}, C{fn(loc,toks)}, C{fn(toks)}, or just C{fn()}. I wonder is this have anything do with this (somehow messing up the argument counts) Can any one help me to resolve this. I am using intelpython 2.7, pyparsing-2.4.7 and Cython '0.25.2'


Solution

  • Even with Cython '0.29.17' got the same error. This workaround will help if you are stuck with python2. Even if I define the function as def __init__(self, s, loc, tokens): error is still appearing because for the different token sequences matching the same grammar will call the registered function in parseAction() with different number of arguments. I modified the function to accept variable number of arguments because of this dynamic behavior. When the arg count is 2 (self included) the second is the tokens, when count is 4 it is the last. So getting the last of the arg would be enough.

    class Body(Node):
        def __init__(self, *args):
            super(Body,self).__init__()
            tokens = args[-1]
            self.code = tokens