Search code examples
haskellparsecwavefront

parsec-3.1.0 with custom token datatype


parsec-3.1.0 ( http://hackage.haskell.org/package/parsec-3.1.0 ) works with any token type. However there are combinators like Text.Parsec.Char.satisfy that are only defined for Char datatype. There doesn't seem to be any more general counterpart available.

Should I define my own versions or did I miss something?

Perhaps there are different parser libraries in Haskell that allows:

  • custom token types
  • custom parser state (I need to parse stateful format - Wavefront OBJ)

Solution

  • Generalized versions of oneOf, noneOf, and anyChar can be built out of a generalized satisfy, easily enough:

    oneOfT :: (Eq t, Show t, Stream s m t) => [t] -> ParsecT s u m t
    oneOfT ts = satisfyT (`elem` ts)
    
    noneOfT :: (Eq t, Show t, Stream s m t) => [t] -> ParsecT s u m t
    noneOfT ts = satisfyT (not . (`elem` ts))
    
    anyT :: (Show t, Stream s m t) => ParsecT s u m t
    anyT = satisfyT (const True)
    
    satisfyT :: (Show t, Stream s m t) => (t -> Bool) -> ParsecT s u m t
    satisfyT p = tokenPrim showTok nextPos testTok
        where
          showTok t     = show t
          testTok t     = if p t then Just t else Nothing
          nextPos p t s = -- however you update position for your token stream
    

    It might seem that the generalization of these seems missing, but you'll notice that these generalizations here make certain assumptions about the type t that may not be true for someone's token type. It is assumed to be an instance of Show and Eq, yet I can imagine token types for which they are displayed some other way than show, and that membership in a class of tokens might be achieved via some method other than == and elem.

    Lastly, once your token type is no longer a Char, how you choose to represent position, and thus updated it, is highly dependent on your representation of tokens and streams.

    Hence, I can see why a more generalized form doesn't exist.