I have a preprocessed C file and I need to enumerate the members of one of the enums inside it. pyparsing
ships with a simple example for that (examples/cpp_enum_parser.py
), but it only works when enum values are positive integers. In real life a value may be negative, hex, or a complex expression.
I don't need structured values, just the names.
enum hello {
minusone=-1,
par1 = ((0,5)),
par2 = sizeof("a\\")bc};,"),
par3 = (')')
};
When parsing the value, the parser should skip everything until [('",}]
and handle these chars. For that Regex or SkipTo may be useful. For strings and chars - QuotedString. For nested parentheses - Forward (examples/fourFn.py
)
altered the original example. I don't know why they removed enum.ignore(cppStyleComment)
from the original script. Put it back.
from pyparsing import *
# sample string with enums and other stuff
sample = '''
stuff before
enum hello {
Zero,
One,
Two,
Three,
Five=5,
Six,
Ten=10,
minusone=-1,
par1 = ((0,5)),
par2 = sizeof("a\\")bc};,"),
par3 = (')')
};
in the middle
enum
{
alpha,
beta,
gamma = 10 ,
zeta = 50
};
at the end
'''
# syntax we don't want to see in the final parse tree
LBRACE,RBRACE,EQ,COMMA = map(Suppress,"{}=,")
lpar = Literal( "(" )
rpar = Literal( ")" )
anything_topl = Regex(r"[^'\"(,}]+")
anything = Regex(r"[^'\"()]+")
expr = Forward()
pths_or_str = quotedString | lpar + expr + rpar
expr << ZeroOrMore( pths_or_str | anything )
expr_topl = ZeroOrMore( pths_or_str | anything_topl )
_enum = Suppress('enum')
identifier = Word(alphas,alphanums+'_')
expr_topl_text = originalTextFor(expr_topl)
enumValue = Group(identifier('name') + Optional(EQ + expr_topl_text('value')))
enumList = Group(ZeroOrMore(enumValue + COMMA) + Optional(enumValue) )
enum = _enum + Optional(identifier('enum')) + LBRACE + enumList('names') + RBRACE
enum.ignore(cppStyleComment)
# find instances of enums ignoring other syntax
for item,start,stop in enum.scanString(sample):
for entry in item.names:
print('%s %s = %s' % (item.enum,entry.name, entry.value))
result:
$ python examples/cpp_enum_parser.py
hello Zero =
hello One =
hello Two =
hello Three =
hello Five = 5
hello Six =
hello Ten = 10
hello minusone = -1
hello par1 = ((0,5))
hello par2 = sizeof("a\")bc};,")
hello par3 = (')')
alpha =
beta =
gamma = 10
zeta = 50