Search code examples
debugginghaskellbittorrent

debugging a Haskell application


After learning few basics I wanted to try a "real world application" in Haskell, started with a Bittorrent client. Following through the explanation from this blog post, I did NOT use the Attoparsec parser combinator library. Instead following through Huttons book, I started writing the Parser Combinators. This is the code that I have so far (Still at the parsing stage, a long journey ahead):

module Main where

import System.Environment (getArgs)
import qualified Data.Map as Map
import Control.Monad (liftM, ap)
import Data.Char (isDigit, isAlpha, isAlphaNum, ord)
import Data.List(foldl')

main :: IO ()
main = do
    [fileName] <- getArgs
    contents <- readFile fileName
    download . parse $ contents

parse :: String -> Maybe BenValue
parse s = case runParser value s of
    []       -> Nothing
    [(p, _)] -> Just p

download :: Maybe BenValue -> IO ()
download (Just p) = print p
download _        = print "Oh!! Man!!"

data BenValue = BenString String
    | BenNumber Integer
    | BenList [BenValue]
    | BenDict (Map.Map String BenValue)
    deriving(Show, Eq)

-- From Hutton, this follows: a Parser is a function 
-- that takes a string and returns a list of results
-- each containing a pair : a result of type a and 
-- an output string. (the string is the unconsumed part of the input).
newtype Parser a = Parser (String -> [(a, String)])

-- Unit takes a value and returns a Parser (a function)
unit :: a -> Parser a
unit v = Parser (\inp -> [(v, inp)])

failure :: Parser a
failure = Parser (\inp -> [])

one :: Parser Char
one = Parser $ \inp -> case inp of
        []      -> []
        (x: xs) -> [(x, xs)]

runParser :: Parser a -> String -> [(a, String)]
runParser (Parser p) inp = p inp

bind :: Parser a -> (a -> Parser b) -> Parser b
bind (Parser p) f = Parser $ \inp -> case p inp of
    []         -> []
    [(v, out)] -> runParser (f v) out

instance Monad Parser where
    return  = unit
    p >>= f = bind p f

instance Applicative Parser where
    pure  = unit
    (<*>) = ap

instance Functor Parser where
    fmap = liftM

choice :: Parser a -> Parser a -> Parser a
choice p q = Parser $ \inp -> case runParser p inp of
    [] -> runParser q inp
    x  -> x

satisfies :: (Char -> Bool) -> Parser Char
satisfies p = do
    x <- one
    if p x 
    then unit x
    else failure

digit :: Parser Char
digit = satisfies isDigit

letter :: Parser Char
letter = satisfies isAlpha

alphanum :: Parser Char
alphanum = satisfies isAlphaNum

char :: Char -> Parser Char
char x = satisfies (== x)

many :: Parser a -> Parser [a]
many p = choice (many1 p) (unit [])

many1 :: Parser a -> Parser [a]
many1 p = do
    v <- p
    vs <- many p
    unit (v:vs)

peek :: Parser Char
peek = Parser $ \inp -> case inp of
    []       -> []
    v@(x:xs) -> [(x, v)]    

taken :: Int -> Parser [Char]
taken n = do
    if n > 0
    then do
        v <- one
        vs <- taken (n-1)
        unit (v:vs)
    else unit []

takeWhile1 :: (Char -> Bool) -> Parser [Char]
takeWhile1 pred = do
    v <- peek
    if pred v
    then do
        one
        vs <- takeWhile1 pred
        unit (v:vs)
    else unit []

decimal :: Integral a => Parser a
decimal = foldl' step 0 `fmap` takeWhile1 isDigit
    where step a c = a * 10 + fromIntegral (ord c - 48)

string :: Parser BenValue
string = do
    n <- decimal
    char ':'
    BenString <$> taken n

signed :: Num a => Parser a -> Parser a
signed p = (negate <$> (char '-' *> p) )
    `choice` (char '+' *> p)
    `choice` p

number :: Parser BenValue
number = BenNumber <$> (char 'i' *> (signed decimal) <* char 'e')

list :: Parser BenValue
list = BenList <$> (char 'l' *> (many value) <* char 'e')

dict :: Parser BenValue
dict = do
    char 'd'
    pair <- many ((,) <$> string <*> value)
    char 'e'
    let pair' = (\(BenString s, v) -> (s,v)) <$> pair
    let map' = Map.fromList pair'
    unit $ BenDict map'

value = string `choice` number `choice` list `choice` dict

The above is a mix of code read/understood from the source code of the three sources the blog, the library, and the book. the download function just prints the "parse tree", obtained from the parser, Once I get the parser working will fill in the download function and test it out.

  1. The parser is NOT working on few of the torrent files. :( There is definitely chance that I might have used code from the references incorrectly. And would like to know if there is anything obvious.
  2. It works on "toy" examples and also on the test file picked from combinatorrent
  3. When I pick a real world torrent like the Debian/Ubuntu etc, this fails.
  4. I would like to debug and see what is happening, debugging with GHCI does not seem straight forward, I've tried :trace / :history style debugging mentioned in this document, but looks very primitive :-) .
  5. My question to the experts out there is: "how to debug!!" :-)
  6. Would really appreciate any hints on approaching how to debug this.

Thanks.


Solution

  • Because Haskell code is pure, "stepping" through it is less essential than in other languages. When I step through some Java code, I am often trying to see where a certain variable gets changed. That is obviously a non-issue in Haskell given that things are immutable.

    That means we can also run snippets of code in GHCi to debug what is happening without worrying that what we run is going to change some global state, or what we run will work any differently than how it would if called deep inside our program. This mode of work benefits from iterating your design slowly building it to work on the full range of expected inputs.

    Parsing is always a bit unpleasant - even in imperative languages. Nobody wants to run a parser just to get back Nothing - you want to know why you got back nothing. To that effect, most parsers libraries are helpful in giving you some information about what went wrong. That is a point for using a parser like attoparsec. Also, attoparsec works with ByteString by default - perfect for binary data. If you want to roll your own parser implementation, you'll have to debug it too.

    Finally, based on your comments, it looks like you are having issues with character encodings. This is exactly the reason why we have ByteString - it represents a packed sequence of bytes - no encodings. The extension OverloadedStrings even makes it pretty easy to make ByteString literals that look just like regular strings.