Search code examples
jsonparsinghaskellparsec

Fixing a bad JSON grammar


I've just started learning about parsing, and I wrote this simple parser in Haskell (using parsec) to read JSON and construct a simple tree for it. I am using the grammar in RFC 4627.

However, when I try parsing the string {"x":1 }, I'm getting the output:

parse error at (line 1, column 8):
unexpected "}"
expecting whitespace character or ","

This only seems to be happening when I have spaces before a closing brace (]) or mustachio (}).

What have I done wrong? If I avoid whitespace before a closing symbol, it works perfectly.


Solution

  • Parsec doesn't do rewinding and backtracking automatically. When you write sepBy member valueSeparator, the valueSeparator consumes white space, so the parser will parse your value like so:

    {"x":1 }
    [------- object
    %        beginObject
     [-]     name
        %    nameSeparator
         %   jvalue
          [- valueSeparator
           X In valueSeparator: unexpected "}"
    
    Legend:
    [--]     full match
    %        full char match
    [--      incomplete match
    X        incomplete char match
    

    When the valueSeparator fails, Parsec won't go back and try a different combination of parses, because one character has already matched in valueSeparator.

    You have two options to solve your problem:

    1. Since white space is insignificant in JSON, always consume white space after a significant token, never before. So, a tok should only consume white space after the char, so its definition is tok c = char c *> ws ((*>) from Control.Applicative); apply the same rule to all the other parsers. Since you'll never consume white space after having entered the "wrong parser" that way, you won't end up having to back-track.
    2. Use back-tracking in Parsec by adding try in front of parsers that might consume more than one character, and that should rewind their input if they fail.

    EDIT: updated ASCII graphic to make more sense.