I am trying to parse sgf files (files that describe games of go). In these files there are key value pairs of the form (in the sgf specifiction they are called property id's and property values, but i am using key and value in the hope that people will know what I am talking about when reading the title):
key[value]
or
key[value1][value2]...[valuen]
That is, there might be 1 or many values. The catch is that the type of the value depends in the key. So for example, if the key is B
(for play a black stone in go). The value is supposed to be a coordinate described by two letters for example: B[ab]
. It might also be that the key is AB
(for adding a number of black stones, for setting up the board), then the value is a list of coordinates like this: AB[ab][cd][fg]
. It could also be that the key is C
(for a comment). Then the value is just a string C[this is a comment]
.
Of course this could be described by the type
type Property = (String, [String])
But i think it would be nicer to have something like
data Property = B Coordinate | AB [Coordinate] | C String ...
Or maybe some other type that better utilizes the type system and won't require that I convert to and from strings all the time.
The problem is that then I would need a parser that depending on the key type returns a different value type, but I think that would cause type problems since a parser can only return one type of value.
How would you parse something like this?
This is actually a straightforward choice, and doesn't need monadic parsing. I'll use the applicative interface to demonstrate this point.
Build a parser for each property id and its property values somewhat like this:
black = B <$> (char 'C' *> coordinate)
white = W <$> (char 'W' *> coordinate)
addBlack = AB <$> (string "AB" *> many1 coordinate)
(assuming you've built a coordinate
parser that eats the brackets and returns something of type Coordinate
).
Each one of those has type Parser Property
(with your second, better structured data type), so now we just get the parser to choose between them. If the property ids all have different first letters, when you use the parser for the wrong id, it will fail without consuming input, which is ideal for the choice operator:
myparser = black <|> white <|> addBlack
But I suspect there's an AW
id for adding white stones, so we'd need to warn that they overlap using try, which backtracks when a parser fails:
mybetterparser = black <|> white <|> (try addBlack <|> try addWhite)
I've bracketed together parsers with a common start and used try on them to go back to the beginning.