Search code examples
parsinghaskellparsec

Parsing a sum datatype with parsec


I am trying to figure out how to parse a sum-datatype in Haskell in the best way possible. This is an extract of what I attempted

type Value = Int

data Operator = ADD | SUB | MUL | DIV | SQR deriving (Show)

toOperator :: String -> Maybe Operator
toOperator "ADD" = Just ADD
toOperator "SUB" = Just SUB
toOperator "MUL" = Just MUL
toOperator "DIV" = Just DIV
toOperator "SQR" = Just SQR
toOperator _     = Nothing

parseOperator :: ParsecT String u Identity () Operator
parseOperator = do
    s <- choice $ map (try . string) ["ADD", "SUB", "MUL", "DIV", "SQR"]
    case toOperator s of
        Just x  -> return x
        Nothing -> fail "Could not parse that operator."

This code does what I want but has one obvious problem: It checks the data twice. Once in the line choice $ map (try . string) ["ADD", "SUB", "MUL", "DIV", "SQR"] and once through toOperator.

What I want, is to want to parse a string into an Operator if it occurs in the list, and fail otherwise. But I can't figure out how to do this in a 'clean' way.


Solution

  • It's simpler if you make toOperator participate in the Parsec parsing process directly, rather than having it be a step that happens separately, because then "whether this thing is a valid operator" can provide feedback into the parsing process.

    For this specific case where the thing you are parsing is a zero-field enum whose constructor names exactly match the strings you are parsing, there are already several good shortcuts posted, showing you how to concisely parse those constructors. In this answer, I will show an alternative method, which is easier to adapt to the general case of "match one of several cases" and to handle fancier stuff like "one of the three constructors has an Int argument but the others don't."

    operator :: StringParser Operator
    operator = string "ADD" *> pure ADD
           <|> string "DIV" *> pure DIV 
           <|> string "MUL" *> pure MUL
           <|> try (string "SUB") *> pure SUB 
           <|> string "SQR" *> pure SQR
    

    Now suppose that you had an additional constructor, VAR, taking a String argument. It is easy to add support for that constructor to this parser:

    operator :: StringParser Operator
    operator = ...
           <|> string "VAR" *> (VAR <$> var)
    
    var :: StringParser String
    var = spaces *> anyChar `manyTill` space