Search code examples
haskellparsec

How do you parse an Intel Hex Record with applicative functors using the haskell parsec library?


I would like to parse an Intel Hex Record with parsec using the applicative functor style. A typical records looks like the following:

:10010000214601360121470136007EFE09D2190140

The first character is always ':', the next two characters are a hex string representing the number of bytes in the record. The next four characters are a hex string identifying the start address of the data. I had code like the following, but I don't know how to applicatively pass the byte count to the parser that parses the data bytes. My non-working code looks like the following.

line = startOfRecord . byteCount . address . recordType . recordData . checksum
startOfRecord = char ':'
byteCount = toHexValue <$> count 2 hexDigit
address = toHexValue <$> count 4 hexDigit
recordType = toHexValue <$> count 2 hexDigit
recordData c = toHexValue <$> count c hexDigit
recordData c CharParser = count c hexDigit
checksum = toHexValue <$> count 2 hexDigit

toHexValue :: String -> Int
toHexValue = fst . head . readHex

Could anyone help me? Thanks.


Solution

  • There are a number of things not included in your question that you need in order to use parsec. To define things like startOfRecord, we need to disable the dreaded monomorphism restriction. If we want to write type signatures for anything like startOfRecord we also need to enable FlexibleContexts. We also need to import parsec, Control.Applicative, and Numeric (readHex)

    {-# LANGUAGE NoMonomorphismRestriction #-}
    {-# LANGUAGE FlexibleContexts #-}
    
    import Text.Parsec
    import Control.Applicative
    import Numeric (readHex)
    

    I'm also going to use Word8 and Word16 from Data.Word since they exactly match the types used in intel hex records.

    import Data.Word
    

    Ignoring the recordData for a momement, we can define how to read hex values for bytes (Word8) and 16 bit integer addresses (Word16).

    hexWord8 :: (Stream s m Char) => ParsecT s u m Word8
    hexWord8 = toHexValue <$> count 2 hexDigit
    
    hexWord16 :: (Stream s m Char) => ParsecT s u m Word16
    hexWord16 = toHexValue <$> count 4 hexDigit
    
    toHexValue :: (Num a, Eq a) => String -> a
    toHexValue = fst . head . readHex
    

    This lets us define all of the pieces except for recordData.

    startOfRecord = char ':'
    byteCount = hexWord8
    address = hexWord16
    recordType = hexWord8
    checksum = hexWord8
    

    Leaving out recordData, we can now write something like your line in Applicative style. Application in Applicative style is written as <*> (. is function composition or composition in Categorys).

    line = _ <$> startOfRecord <*> byteCount <*> address <*> recordType <*> checksum
    

    The compiler will tell us about the type of the hole _. It says

        Found hole `_'
          with type: Char -> Word8 -> Word16 -> Word8 -> Word8 -> b
    

    If we had a function with that type, we could use it here and make a ParserT that reads something like a record, but still missing the recordData. We'll make a data type to hold all of an intel hex record except for the actual data.

    data IntelHexRecord = IntelHexRecord Word8 Word16 Word8 {- [Word8] -} Word8
    

    If we drop this into line (with const to discard the startOfRecord)

    line = const IntelHexRecord <$> startOfRecord <*> byteCount <*> address <*> recordType <*> checksum
    

    the compiler will tell us that the type of line is a parser for our pseudo-IntelHexRecord.

    *> :t line
    line :: Stream s m Char => ParsecT s u m IntelHexRecord
    

    This is as far as we can go with Applicative style. Let's define how to read the recordData assuming we already somehow know the byteCount.

    recordData :: (Stream s m Char) => Word8 -> ParsecT s u m [Word8]
    recordData c = count (fromIntegral c) hexWord8
    

    We'll also modify IntelHexRecord to have a place to hold the data.

    data IntelHexRecord = IntelHexRecord Word8 Word16 Word8 [Word8] Word8
    

    If you have an Applicative f, there's no way, in general, to choose the structure based on the contents. That's the big difference between an Applicative and a Monad; a Monad's bind, (>>=) :: forall a b. m a -> (a -> m b) -> m b, allows you to choose the structure based on the contents. This is exactly what we need to do to determine how to read the recordData based on the result we obtained earlier by reading the byteCount.

    The easiest way to use one bind >>= in the definition of line is to switch entirely to Monadic style and do-notation.

    line = do
        startOfRecord
        bc   <- byteCount
        addr <- address
        rt   <- recordType
        rd   <- recordData bc
        cs   <- checksum
        return $ IntelHexRecord bc addr rt rd cs