Search code examples
haskellparsec

Why does parsing a string with a minus in it fail?


Why does this fail?

data Value = Num Integer
           | Str String

let numberOrString = (liftM Num (try int) <|> liftM Str (many1 (noneOf " "))
parse (numberOrString >> space)  "" "123-4 "

The >> space is required. Else the parser would stop after 123

Expected result:

parse numberOrString "" "1234"
-> Num 1234

parse numberOrString "" "12-34"
-> Str "12-34"

Result:

parse numberOrString "" "1234-34 "
-> Left (line 1, column 5):
   unexpected "-"
   expecting digit or space

Solution

  • You do not give the definitions for the int parser combinator, but let's assume that it essentially expects one or more digits, optionally with a "-" in front (but only in front!). Now let's walk through your numberOrString parser.

    It parses an integer literal, or failing that, a string of non space characters. On your example string, the first branch of your parser will succeed, because it sees a string of digits. It will stop right before the "-" character, because that is not a digit. Now, numberOrString >> space will fail becauseafter a number, we are expecting the next character to be a space, which "-" is not.

    You have essentially applied your parser to a string of two integer literals, one positive and one negative (or two positive literals separated by a "-", depending on your grammar). This is also the reason why applying the numberOfString parser alone only consumes "1234", because that is the maximal integer literal that it can parse.

    Edit: what I'm guessing you want is for try int to fail if there are any non-digit characters in a string of mostly digits. Again, this really depends on your definition of int, but it's probably defined as a parser that succeeds on any string of at least one digit that is not followed by an alphabetic character. Usual definitions of int will succeed on strings of digits followed by non alphanumber characters, such as "-", even without any intervening space character, because that makes the spaces between two integer literals and an infix operator optional. Also, it allows you to successfully parse "123)", without having to write "123 )".