Search code examples
haskellparsecattoparsec

Equivalent of attoparsecs `inClass` in Parsec


I am translating some code from attoparsec to Parsec, because the parser needs to produce better error messages. The attoparsec code uses inClass (and notInClass) extensively. Is there a similar function for Parsec that lets me translate inClass-occurences mechanically? Hayoo and Hoogle didn't offer any insight into the matter.

inClass :: String -> Char -> Bool

inClass "a-c'-)0-3-" is equivalent to \ x -> elem x "abc'()0123-", but the latter is inefficient and tedious to write for large ranges.

I will reimplement the function myself if nothing else is available.


Solution

  • There isn't any such combinator; if there was, it would be in Text.Parsec.Char (which is where all the standard parser combinator functions that involve Char are defined). You should be able to define it fairly easily.

    I don't think you'll be able to get the same performance advantages attoparsec does with its implementation, though; it relies on the internal FastSet type, which only works with 8-bit characters. Of course, if you don't need Unicode support, that might not be a problem, but the code for FastSet implies you'll get unpredictable results passing Chars greater than '\255', so if you want to reuse the FastSet-based solution, you'll at least have to read the strings you're parsing in binary mode. (You'll also have to copy the implementation of FastSet into your program, as it's not exported...)

    If your range strings are short, then a simple solution like this is likely to be pretty fast:

    type Range = (Char, Char)
    
    inClass :: String -> Char -> Bool
    inClass = inClass' . parseClass
    
    parseClass :: String -> [Range]
    parseClass "" = []
    parseClass (a:'-':b:xs) = (a, b) : parseClass xs
    parseClass (x:xs) = (x, x) : parseClass xs
    
    inClass' :: [Range] -> Char -> Bool
    inClass' cls c = any (\(a,b) -> c >= a && c <= b) cls
    

    You could even try something like this, which should be at least as efficient as the above version (including when many calls to a single inClass s are made), and additionally avoid the list traversal overhead:

    inClass :: String -> Char -> Bool
    inClass "" = const False
    inClass (a:'-':b:xs) = \c -> (c >= a && c <= b) || f c where f = inClass xs
    inClass (x:xs) = \c -> c == x || f c where f = inClass xs
    

    (taking care to move the recursion out of the lambda; I don't know if GHC can/will do this itself.)