I would like to parse an Intel Hex Record with parsec using the applicative functor style. A typical records looks like the following:
:10010000214601360121470136007EFE09D2190140
The first character is always ':', the next two characters are a hex string representing the number of bytes in the record. The next four characters are a hex string identifying the start address of the data. I had code like the following, but I don't know how to applicatively pass the byte count to the parser that parses the data bytes. My non-working code looks like the following.
line = startOfRecord . byteCount . address . recordType . recordData . checksum
startOfRecord = char ':'
byteCount = toHexValue <$> count 2 hexDigit
address = toHexValue <$> count 4 hexDigit
recordType = toHexValue <$> count 2 hexDigit
recordData c = toHexValue <$> count c hexDigit
recordData c CharParser = count c hexDigit
checksum = toHexValue <$> count 2 hexDigit
toHexValue :: String -> Int
toHexValue = fst . head . readHex
Could anyone help me? Thanks.
There are a number of things not included in your question that you need in order to use parsec. To define things like startOfRecord
, we need to disable the dreaded monomorphism restriction. If we want to write type signatures for anything like startOfRecord
we also need to enable FlexibleContexts
. We also need to import parsec, Control.Applicative
, and Numeric (readHex)
{-# LANGUAGE NoMonomorphismRestriction #-}
{-# LANGUAGE FlexibleContexts #-}
import Text.Parsec
import Control.Applicative
import Numeric (readHex)
I'm also going to use Word8
and Word16
from Data.Word
since they exactly match the types used in intel hex records.
import Data.Word
Ignoring the recordData
for a momement, we can define how to read hex values for bytes (Word8
) and 16 bit integer addresses (Word16
).
hexWord8 :: (Stream s m Char) => ParsecT s u m Word8
hexWord8 = toHexValue <$> count 2 hexDigit
hexWord16 :: (Stream s m Char) => ParsecT s u m Word16
hexWord16 = toHexValue <$> count 4 hexDigit
toHexValue :: (Num a, Eq a) => String -> a
toHexValue = fst . head . readHex
This lets us define all of the pieces except for recordData
.
startOfRecord = char ':'
byteCount = hexWord8
address = hexWord16
recordType = hexWord8
checksum = hexWord8
Leaving out recordData
, we can now write something like your line
in Applicative
style. Application in Applicative
style is written as <*>
(.
is function composition or composition in Category
s).
line = _ <$> startOfRecord <*> byteCount <*> address <*> recordType <*> checksum
The compiler will tell us about the type of the hole _
. It says
Found hole `_'
with type: Char -> Word8 -> Word16 -> Word8 -> Word8 -> b
If we had a function with that type, we could use it here and make a ParserT
that reads something like a record, but still missing the recordData
. We'll make a data type to hold all of an intel hex record except for the actual data.
data IntelHexRecord = IntelHexRecord Word8 Word16 Word8 {- [Word8] -} Word8
If we drop this into line
(with const
to discard the startOfRecord
)
line = const IntelHexRecord <$> startOfRecord <*> byteCount <*> address <*> recordType <*> checksum
the compiler will tell us that the type of line
is a parser for our pseudo-IntelHexRecord
.
*> :t line
line :: Stream s m Char => ParsecT s u m IntelHexRecord
This is as far as we can go with Applicative
style. Let's define how to read the recordData
assuming we already somehow know the byteCount
.
recordData :: (Stream s m Char) => Word8 -> ParsecT s u m [Word8]
recordData c = count (fromIntegral c) hexWord8
We'll also modify IntelHexRecord
to have a place to hold the data.
data IntelHexRecord = IntelHexRecord Word8 Word16 Word8 [Word8] Word8
If you have an Applicative f
, there's no way, in general, to choose the structure based on the contents. That's the big difference between an Applicative
and a Monad
; a Monad
's bind, (>>=) :: forall a b. m a -> (a -> m b) -> m b
, allows you to choose the structure based on the contents. This is exactly what we need to do to determine how to read the recordData
based on the result we obtained earlier by reading the byteCount
.
The easiest way to use one bind >>=
in the definition of line
is to switch entirely to Monad
ic style and do
-notation.
line = do
startOfRecord
bc <- byteCount
addr <- address
rt <- recordType
rd <- recordData bc
cs <- checksum
return $ IntelHexRecord bc addr rt rd cs