Search code examples
haskellparser-combinatorsmegaparsec

Permutation parsing with megaparsec + parser-combinators too lenient


I'm attempting to parse permutations of flags. The behavior I want is "one or more flags in any order, without repetition". I'm using the following packages:

  • megaparsec
  • parser-combinators

The code I have is outputting what I want, but is too lenient on inputs. I don't understand why it's accepting multiples of the same flags. What am I doing wrong here?

pFlags :: Parser [Flag]
pFlags = runPermutation $ f <$> 
    toPermutation (optional (GroupFlag <$ char '\'')) <*> 
    toPermutation (optional (LeftJustifyFlag <$ char '-'))
    where f a b = catMaybes [a, b]

Examples:

"'-" = [GroupFlag, LeftJustifyFlag] -- CORRECT
"-'" = [LeftJustifyFlag, GroupFlag] -- CORRECT
"''''-" = [GroupFlag, LeftJustifyFlag] -- INCORRECT, should fail if there's more than one of the same flag.

Solution

  • Instead of toPermutation with optional, I believe you need to use toPermutationWithDefault, something like this (untested):

    toPermutationWithDefault Nothing (Just GroupFlag <$ char '\'')
    

    The reasoning is described in the paper “Parsing Permutation Phrases” (PDF) in §4, “adding optional elements” (emph. added):

    Consider, for example […] all permutations of a, b and c. Suppose b can be empty and we want to recognise ac. This can be done in three different ways since the empty b can be recognised before a, after a or after c. Fortunately, it is irrelevant for the result of a parse where exactly the empty b is derived, since order is not important. This allows us to use a strategy similar to the one proposed by Cameron: parse nonempty constituents as they are seen and allow the parser to stop if all remaining elements are optional. When the parser stops the default values are returned for all optional elements that have not been recognised.

    To implement this strategy we need to be able to determine whether a parser can derive the empty string and split it into its default value and its non-empty part, i.e. a parser that behaves the same except that it does not recognise the empty string.

    That is, the permutation parser needs to know which elements can succeed without consuming input, otherwise it will be too eager to commit to a branch. I don’t know why this would lead to accepting multiples of an element, though; perhaps you’re also missing an eof?