I'm trying to parse Visual Basic (VBA) function declarations with pyparsing to convert them into Python syntax.
The usual VBA function header is not a major problem, that works fine for me. But I have difficulties with the arguments list:
Public Function MyFuncName(first As Integer, Second As String) As Integer
The arguments consist of a comma-separated list of zero to many parts like:
VarName
VarName As VarType
Optional VarName As VarType = InitValue
ByVal VarName As VarType
where "Optional", "ByVal" and "ByRef" are fully optional, as well as the type declaration.
My idea was to extract the full arguments list from the original line by
allparams = Regex('[^)]*').setResultsName('params')
and then parse them separately. This matches a single parameter:
variablename = Word(alphas + '_', alphanums + '_')
typename = variablename.setResultsName('type')
default_value = Word(alphanums)
optional_term = oneOf('Optional', True)
byval_term = oneOf('ByRef ByVal', True)
paramsparser = Optional(optional_term) \
+Optional(byval_term) \
+variablename.setResultsName('pname', True) \
+Optional('As' + typename) \
+Optional('=' + default_value)
But even with delimitedList(paramsparser)
I only get the first of them.
AssertionError: 'def test(one):\n\tpass' != 'def test(one, two):\n\tpass'
- def test(one):
+ def test(one, two):
? +++++
Do you have any ideas to get that?
I used your code pretty much as you posted, and wrapped it in a delimitedList
and got both params:
paramsparser = Optional(optional_term) \
+Optional(byval_term) \
+variablename.setResultsName('pname', True) \
+Optional('As' + typename) \
+Optional('=' + default_value)
parser = "(" + delimitedList(paramsparser) + ")"
parser.runTests("""\
(one, two)
(ByRef one As Int = 1, Optional ByVal two As Char)
""")
prints:
(one, two)
['(', 'one', 'two', ')']
- pname: ['one', 'two']
(ByRef one As Int = 1, Optional ByVal two As Char)
['(', 'ByRef', 'one', 'As', 'Int', '=', '1', 'Optional', 'ByVal', 'two', 'As', 'Char', ')']
- pname: ['one', 'two']
- type: 'Char'
But since there are so many fields for each param, I would suggest giving each field a separate results name and wrap in Group to keep params from stepping on each other. Here is my rework of your parser (very helpful that you posted the various forms for the different optional declaration fields):
from pyparsing import (Word, alphas, alphanums, quotedString, Keyword, Group, Optional, oneOf, delimitedList,
Suppress, pyparsing_common as ppc)
LPAR, RPAR, EQ = map(Suppress, "()=")
OPTIONAL, BYREF, BYVAL, AS, FUNCTION = map(Keyword, "Optional ByRef ByVal As Function".split())
# think abstract for expression names, like 'identifier' not 'variablename'; then
# you can use identifier for the variable name, the function name, as a possible
# var type, etc.
identifier = Word(alphas + "_", alphanums + "_")
rvalue = ppc.number() | quotedString() | identifier()
type_expr = identifier()
# add results names when assembling in groups
param_expr = Group(
Optional(OPTIONAL("optional"))
+ Optional(BYREF("byref") | BYVAL("byval"))
+ identifier("pname")
+ Optional(AS + type_expr("ptype"))
+ Optional(EQ + rvalue("default"))
)
Then, instead of using a regex to get the params and then re-parse in a separate step, I would just include this in the overall function expression definition:
protection = oneOf("Public Private", asKeyword=True)
func_expr = (
protection("protection")
+ FUNCTION
+ identifier("fname")
+ Group(LPAR + delimitedList(param_expr) + RPAR)("parameters")
+ Optional(AS + type_expr("return_type"))
)
tests = """
Public Function MyFuncName(first As Integer, Second As String) As Integer
"""
func_expr.runTests(tests)
Prints:
Public Function MyFuncName(first As Integer, Second As String) As Integer
['Public', 'Function', 'MyFuncName', [['first', 'As', 'Integer'], ['Second', 'As', 'String']], 'As', 'Integer']
- fname: 'MyFuncName'
- parameters: [['first', 'As', 'Integer'], ['Second', 'As', 'String']]
[0]:
['first', 'As', 'Integer']
- pname: 'first'
- ptype: 'Integer'
[1]:
['Second', 'As', 'String']
- pname: 'Second'
- ptype: 'String'
- protection: 'Public'
- return_type: 'Integer'