Search code examples
parsinghaskellinterpreterparsec

Using Haskell's Parsec for Programming Language Converter


Say I have two languages (A & B). My goal is to write some type of program to convert the syntax found in A to the equivalent of B. Currently my solution has been to use Haskell's Parsec to perform this task. As someone who is new to Haskell and functional programming for that matter however, finding just a simple example in Parsec has been quite difficult. The examples I have found on the web are either incomplete examples (frustrating for a new Haskell programmer) or too much removed from my goal.

So can someone provide me with an amazingly trivial and explicit example of using Parsec for something related to what I'd like to achieve? Or perhaps even some tutorial that parallels my goal as well.

Thanks.


Solution

  • Consider the following simple grammar of a CSV document (In ABNF):

    csv   = *crow
    crow  = *(ccell ',') ccell CR
    ccell = "'" *(ALPHA / DIGIT) "'"
    

    We want to write a converter that converts this grammar into a TSV (tabulator separated values) document:

    tsv   = *trow
    trow  = *(tcell HTAB) tcell CR
    tcell = DQUOTE *(ALPHA / DIGIT) DQUOTE
    

    First of all, let's create an algebraic data type that descibes our abstract syntax tree. Type synonyms are included to ease understandment:

    data XSV  = [Row]
    type Row  = [Cell]
    type Cell = String
    

    Writing a parser for this grammar is pretty simple. We write a parser as if we would describe the ABNF:

    csv :: Parser XSV
    csv = XSV <$> many crow
    
    crow :: Parser Row
    crow = do cells <- ccell `sepBy` (char ',')
              newline
              return cells
    
    ccell :: Parser Cell
    ccell = do char '\''
               content <- many (digit <|> letter)
               char '\''
               return content
    

    This parser uses do-notation. After a do, a sequence of statements follows. For parsers, these statements are simply other parsers. One can use <- to bind the result of a parser. This way, one builds a big parser by chaining multiple smaller parsers. To obtain interesting effects, one can also combine parser using special combinators (such as a <|> b, which parses either a or b or many a, which parses as many as as possible). Please be aware that Parsec does not backtrack by default. If a parser might fail after consuming characters, prefix it with try to enable backtracking for one instance. try slows down parsing.

    The result is a parser csv that parses our CSV document into an abstract syntax tree. Now it is easy to turn that into another language (such as TSV):

    xsvToTSV :: XSV -> String
    xsvToTSV xst = unlines (map toLines xst) where
      toLines = intersperse '\t'
    

    Connecting these two things one gets a conversion function:

    csvToTSV :: String -> Maybe String
    csvToTSV document = case parse csv "" document of
      Left _    -> Nothing
      Right xsv -> xsvToTSV xsv
    

    And that is all! Parsec has lots of other functions to build up extremely sophisticated parsers. The book Real World Haskell has a nice chapter about parsers, but it's a little bit outdated. Most of that is still true, though. If you have further questions, feel free to ask.