Search code examples
parsinghaskellparsec

How to parse arbitrary lists with Haskell parsers?


Is it possible to use one of the parsing libraries (e.g. Parsec) for parsing something different than a String? And how would I do this?

For the sake of simplicity, let's assume the input is a list of ints [Int]. The task could be

  • drop leading zeros
  • parse the rest into the pattern (S+L+)*, where S is a number less than 10, and L is a number larger or equal to ten.
  • return a list of tuples (Int,Int), where fst is the product of the S and snd is the product of the L integers

It would be great if someone could show how to write such a parser (or something similar).


Solution

  • Yes, as user5402 points out, Parsec can parse any instance of Stream, including arbitrary lists. As there are no predefined token parsers (as there are for text) you have to roll your own, (myToken below) using e.g. tokenPrim

    The only thing I find a bit awkward is the handling of "source positions". SourcePos is an abstract type (rather than a type class) and forces me to use its "filename/line/column" format, which feels a bit unnatural here.

    Anyway, here is the code (without the skipping of leading zeroes, for brevity)

    import Text.Parsec
    
    myToken ::  (Show a) => (a -> Bool) -> Parsec [a] () a
    myToken test = tokenPrim show incPos $ justIf test where
      incPos pos _ _ = incSourceColumn pos 1
      justIf test x = if (test x) then Just x else Nothing
    
    small = myToken  (< 10)
    large = myToken  (>= 10)
    
    smallLargePattern = do
      smallints <- many1 small
      largeints <- many1 large
      let prod = foldl1 (*)
      return (prod smallints, prod largeints)
    
    myIntListParser :: Parsec [Int] () [(Int,Int)]
    myIntListParser = many smallLargePattern
    
    testMe :: [Int] -> [(Int, Int)]
    testMe xs = case parse myIntListParser "your list" xs of
      Left err -> error $ show err
      Right result -> result
    

    Trying it all out:

    *Main> testMe [1,2,55,33,3,5,99]
    [(2,1815),(15,99)]
    *Main> testMe [1,2,55,33,3,5,99,1]
    *** Exception: "your list" (line 1, column 9):
    unexpected end of input
    

    Note the awkward line/column format in the error message

    Of course one could write a function sanitiseSourcePos :: SourcePos -> MyListPosition