Search code examples
haskellparsec

Haskell: Parsec trouble breaking out of pattern


For reference, here is my code: http://hpaste.org/86949

I am trying to parse the following expression: if (a[1].b[2].c.d[999].e[1+1].f > 3) { }. The method playing up is varExpr, which parses the variable member chains.

Context

In the language I am parsing, a dot can specify accessing a member variable. Since a member variable can be another object, chains can be produced ie: a.b.c, or essentially (a.b).c. Do not assume the dots are function composition.

Implementation

The logic is like this:

  • First, before <- many vocc collects all the instances of varname . and their optional array expression, leaving only a single identifier left

  • this <- vtrm collects the remaining identifier plus array expression -- the only one not proceeded by a dot

Issues

I am having two issues:

Firstly, the first term [for a reason that I cannot determine] seems to always require that it be wrapped in brackets for the parser to accept it ie: (a[1]).b[2].c... -- subsequent terms do not require this.

Secondly, the many vocc won't stop parsing. It always expects another identifier and another dot and I am unable to terminate the expression to catch the last vtrm.

I am looking for hints or solutions that will help me solve my problem(s)/headaches. Thanks.


Solution

  • When varExpr runs, it checks whether the next bit of input is matched by vocc or vtrm.

    varExpr = do before <- many vocc  -- Zero or more occurrences
                 this <- vtrm
                 return undefined
    

    The problem is that any input matched by vtrm is also matched by the first step of vocc. When varExpr runs, it runs vocc, which runs vobj, which runs vtrm.

    vocc = vobj <* symbol "."
    vobj = choice [try vtrm, try $ parens vtrm]
    

    Parsing of many vocc ends when vocc fails without consuming input. This happens when both vtrm and parens vtrm fail. However, after many vocc ends, the next parser to run is vtrm—and this parser is sure to fail!

    You want vocc to fail without consuming input if it doesn't find a "." in the input. For that, you need to use try.

    vocc = try $ vobj <* symbol "."
    

    Alternatively, if vobj and vtrm really should be the same syntax, you can define varExpr as vobj `sepBy1` symbol ".".