For reference, here is my code: http://hpaste.org/86949
I am trying to parse the following expression: if (a[1].b[2].c.d[999].e[1+1].f > 3) { }
. The method playing up is varExpr
, which parses the variable member chains.
Context
In the language I am parsing, a dot can specify accessing a member variable. Since a member variable can be another object, chains can be produced ie: a.b.c
, or essentially (a.b).c
. Do not assume the dots are function composition.
Implementation
The logic is like this:
First, before <- many vocc
collects all the instances of varname .
and their optional array expression, leaving only a single identifier left
this <- vtrm
collects the remaining identifier plus array expression -- the only one not proceeded by a dot
Issues
I am having two issues:
Firstly, the first term [for a reason that I cannot determine] seems to always require that it be wrapped in brackets for the parser to accept it ie: (a[1]).b[2].c...
-- subsequent terms do not require this.
Secondly, the many vocc
won't stop parsing. It always expects another identifier and another dot and I am unable to terminate the expression to catch the last vtrm
.
I am looking for hints or solutions that will help me solve my problem(s)/headaches. Thanks.
When varExpr
runs, it checks whether the next bit of input is matched by vocc
or vtrm
.
varExpr = do before <- many vocc -- Zero or more occurrences
this <- vtrm
return undefined
The problem is that any input matched by vtrm
is also matched by the first step of vocc
. When varExpr
runs, it runs vocc
, which runs vobj
, which runs vtrm
.
vocc = vobj <* symbol "."
vobj = choice [try vtrm, try $ parens vtrm]
Parsing of many vocc
ends when vocc
fails without consuming input. This happens when both vtrm
and parens vtrm
fail. However, after many vocc
ends, the next parser to run is vtrm
—and this parser is sure to fail!
You want vocc
to fail without consuming input if it doesn't find a "."
in the input. For that, you need to use try
.
vocc = try $ vobj <* symbol "."
Alternatively, if vobj
and vtrm
really should be the same syntax, you can define varExpr as vobj `sepBy1` symbol "."
.