After learning few basics I wanted to try a "real world application" in Haskell, started with a Bittorrent client. Following through the explanation from this blog post, I did NOT use the Attoparsec parser combinator library. Instead following through Huttons book, I started writing the Parser Combinators. This is the code that I have so far (Still at the parsing stage, a long journey ahead):
module Main where
import System.Environment (getArgs)
import qualified Data.Map as Map
import Control.Monad (liftM, ap)
import Data.Char (isDigit, isAlpha, isAlphaNum, ord)
import Data.List(foldl')
main :: IO ()
main = do
[fileName] <- getArgs
contents <- readFile fileName
download . parse $ contents
parse :: String -> Maybe BenValue
parse s = case runParser value s of
[] -> Nothing
[(p, _)] -> Just p
download :: Maybe BenValue -> IO ()
download (Just p) = print p
download _ = print "Oh!! Man!!"
data BenValue = BenString String
| BenNumber Integer
| BenList [BenValue]
| BenDict (Map.Map String BenValue)
deriving(Show, Eq)
-- From Hutton, this follows: a Parser is a function
-- that takes a string and returns a list of results
-- each containing a pair : a result of type a and
-- an output string. (the string is the unconsumed part of the input).
newtype Parser a = Parser (String -> [(a, String)])
-- Unit takes a value and returns a Parser (a function)
unit :: a -> Parser a
unit v = Parser (\inp -> [(v, inp)])
failure :: Parser a
failure = Parser (\inp -> [])
one :: Parser Char
one = Parser $ \inp -> case inp of
[] -> []
(x: xs) -> [(x, xs)]
runParser :: Parser a -> String -> [(a, String)]
runParser (Parser p) inp = p inp
bind :: Parser a -> (a -> Parser b) -> Parser b
bind (Parser p) f = Parser $ \inp -> case p inp of
[] -> []
[(v, out)] -> runParser (f v) out
instance Monad Parser where
return = unit
p >>= f = bind p f
instance Applicative Parser where
pure = unit
(<*>) = ap
instance Functor Parser where
fmap = liftM
choice :: Parser a -> Parser a -> Parser a
choice p q = Parser $ \inp -> case runParser p inp of
[] -> runParser q inp
x -> x
satisfies :: (Char -> Bool) -> Parser Char
satisfies p = do
x <- one
if p x
then unit x
else failure
digit :: Parser Char
digit = satisfies isDigit
letter :: Parser Char
letter = satisfies isAlpha
alphanum :: Parser Char
alphanum = satisfies isAlphaNum
char :: Char -> Parser Char
char x = satisfies (== x)
many :: Parser a -> Parser [a]
many p = choice (many1 p) (unit [])
many1 :: Parser a -> Parser [a]
many1 p = do
v <- p
vs <- many p
unit (v:vs)
peek :: Parser Char
peek = Parser $ \inp -> case inp of
[] -> []
v@(x:xs) -> [(x, v)]
taken :: Int -> Parser [Char]
taken n = do
if n > 0
then do
v <- one
vs <- taken (n-1)
unit (v:vs)
else unit []
takeWhile1 :: (Char -> Bool) -> Parser [Char]
takeWhile1 pred = do
v <- peek
if pred v
then do
one
vs <- takeWhile1 pred
unit (v:vs)
else unit []
decimal :: Integral a => Parser a
decimal = foldl' step 0 `fmap` takeWhile1 isDigit
where step a c = a * 10 + fromIntegral (ord c - 48)
string :: Parser BenValue
string = do
n <- decimal
char ':'
BenString <$> taken n
signed :: Num a => Parser a -> Parser a
signed p = (negate <$> (char '-' *> p) )
`choice` (char '+' *> p)
`choice` p
number :: Parser BenValue
number = BenNumber <$> (char 'i' *> (signed decimal) <* char 'e')
list :: Parser BenValue
list = BenList <$> (char 'l' *> (many value) <* char 'e')
dict :: Parser BenValue
dict = do
char 'd'
pair <- many ((,) <$> string <*> value)
char 'e'
let pair' = (\(BenString s, v) -> (s,v)) <$> pair
let map' = Map.fromList pair'
unit $ BenDict map'
value = string `choice` number `choice` list `choice` dict
The above is a mix of code read/understood from the source code of the three sources the blog, the library, and the book. the download
function just prints the "parse tree", obtained from the parser, Once I get the parser working will fill in the download
function and test it out.
:trace
/ :history
style debugging mentioned in this document, but looks very primitive :-) .Thanks.
Because Haskell code is pure, "stepping" through it is less essential than in other languages. When I step through some Java code, I am often trying to see where a certain variable gets changed. That is obviously a non-issue in Haskell given that things are immutable.
That means we can also run snippets of code in GHCi to debug what is happening without worrying that what we run is going to change some global state, or what we run will work any differently than how it would if called deep inside our program. This mode of work benefits from iterating your design slowly building it to work on the full range of expected inputs.
Parsing is always a bit unpleasant - even in imperative languages. Nobody wants to run a parser just to get back Nothing
- you want to know why you got back nothing. To that effect, most parsers libraries are helpful in giving you some information about what went wrong. That is a point for using a parser like attoparsec
. Also, attoparsec
works with ByteString
by default - perfect for binary data. If you want to roll your own parser implementation, you'll have to debug it too.
Finally, based on your comments, it looks like you are having issues with character encodings. This is exactly the reason why we have ByteString
- it represents a packed sequence of bytes - no encodings. The extension OverloadedStrings
even makes it pretty easy to make ByteString
literals that look just like regular strings.