Haskell's Parsec supports parsing of permutation phrases via the Perm module. In the documentation for permute
you can see how the various permutations "abc", "aab", "bc", "cbaaaaa", etc. can be parsed. While the example shows support for parting many contiguous instances of the same element like "aaaa", it won't parse non-contiguous instances like "aabca", presumably because each parser is included only once in each permutation (the paper seems to imply this in the tree...)
Besides sorting the input so like instances are contiguous, what options do I have for parsing non-contiguous instance?
Depending on what you actually want, you may be able to use many $ oneOf ['a','b','c']
.
If you really need to use the permutation parser, keep in mind that allowing parses of multiple adjacent characters introduces ambiguity. For example, in the string "bacacbbca"
, it could be parsed as the perms bac
, acb
bca
, or, if you allow repeated characters, bac
, acbb
, with a leftover non-permutation of ca
.
If you allow repeated letters,
{-# LANGUAGE FlexibleContexts #-}
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.Perm
import Data.Text
import Control.Monad.Identity
perm :: (Stream s Identity Char) => Parsec s u (String,String,String)
perm = permute $ triple <$$>
(many1 $ char 'a') <||>
(many1 $ char 'b') <||>
(many1 $ char 'c')
where triple a b c = (a,b,c)
multiPerm :: (Stream s Identity Char) => Parsec s u [(String,String,String)]
multiPerm = many $ try $ perm
main :: IO ()
main = parseTest multiPerm $ "bacacbbca"
main
produces [("a","b","c"),("a","bb","c")]
.
If not:
perm :: (Stream s Identity Char) => Parsec s u (Char,Char,Char)
perm = permute $ triple <$$>
(char 'a') <||>
(char 'b') <||>
(char 'c')
where triple a b c = (a,b,c)
you get the arguably better: [('a','b','c'),('a','b','c'),('a','b','c')]
.