Search code examples
pythonpyparsing

Is it possible to parse non-trivial C enums with pyparsing?


I have a preprocessed C file and I need to enumerate the members of one of the enums inside it. pyparsing ships with a simple example for that (examples/cpp_enum_parser.py), but it only works when enum values are positive integers. In real life a value may be negative, hex, or a complex expression.

I don't need structured values, just the names.

enum hello {
    minusone=-1,
    par1 = ((0,5)),
    par2 = sizeof("a\\")bc};,"),
    par3 = (')')
};

When parsing the value, the parser should skip everything until [('",}] and handle these chars. For that Regex or SkipTo may be useful. For strings and chars - QuotedString. For nested parentheses - Forward (examples/fourFn.py)


Solution

  • altered the original example. I don't know why they removed enum.ignore(cppStyleComment) from the original script. Put it back.

    from pyparsing import *
    # sample string with enums and other stuff
    sample = '''
        stuff before
        enum hello {
            Zero,
            One,
            Two,
            Three,
            Five=5,
            Six,
            Ten=10,
            minusone=-1,
            par1 = ((0,5)),
            par2 = sizeof("a\\")bc};,"),
            par3 = (')')
            };
        in the middle
        enum
            {
            alpha,
            beta,
            gamma = 10 ,
            zeta = 50
            };
        at the end
        '''
    
    # syntax we don't want to see in the final parse tree
    LBRACE,RBRACE,EQ,COMMA = map(Suppress,"{}=,")
    
    
    lpar  = Literal( "(" )
    rpar  = Literal( ")" )
    anything_topl = Regex(r"[^'\"(,}]+")
    anything      = Regex(r"[^'\"()]+")
    
    expr = Forward()
    pths_or_str = quotedString | lpar + expr + rpar
    expr <<     ZeroOrMore( pths_or_str | anything )
    expr_topl = ZeroOrMore( pths_or_str | anything_topl )
    
    _enum = Suppress('enum')
    identifier = Word(alphas,alphanums+'_')
    expr_topl_text = originalTextFor(expr_topl)
    enumValue = Group(identifier('name') + Optional(EQ + expr_topl_text('value')))
    enumList = Group(ZeroOrMore(enumValue + COMMA) + Optional(enumValue) )
    enum = _enum + Optional(identifier('enum')) + LBRACE + enumList('names') + RBRACE
    enum.ignore(cppStyleComment)
    
    # find instances of enums ignoring other syntax
    for item,start,stop in enum.scanString(sample):
        for entry in item.names:
            print('%s %s = %s' % (item.enum,entry.name, entry.value))
    

    result:

    $ python examples/cpp_enum_parser.py
    hello Zero =
    hello One =
    hello Two =
    hello Three =
    hello Five = 5
    hello Six =
    hello Ten = 10
    hello minusone = -1
    hello par1 = ((0,5))
    hello par2 = sizeof("a\")bc};,")
    hello par3 = (')')
     alpha =
     beta =
     gamma = 10
     zeta = 50