Around 6 years ago, I benchmarked my own parser combinators in OCaml and found that they were ~5× slower than the parser generators on offer at the time. I recently revisited this subject and benchmarked Haskell's Parsec vs a simple hand-rolled precedence climbing parser written in F# and was surprised to find the F# to be 25× faster than the Haskell.
Here's the Haskell code I used to read a large mathematical expression from file, parse and evaluate it:
import Control.Applicative
import Text.Parsec hiding ((<|>))
expr = chainl1 term ((+) <$ char '+' <|> (-) <$ char '-')
term = chainl1 fact ((*) <$ char '*' <|> div <$ char '/')
fact = read <$> many1 digit <|> char '(' *> expr <* char ')'
eval :: String -> Int
eval = either (error . show) id . parse expr "" . filter (/= ' ')
main :: IO ()
main = do
file <- readFile "expr"
putStr $ show $ eval file
putStr "\n"
and here's my self-contained precedence climbing parser in F#:
let rec (|Expr|) = function
| P(f, xs) -> Expr(loop (' ', f, xs))
| xs -> invalidArg "Expr" (sprintf "%A" xs)
and loop = function
| ' ' as oop, f, ('+' | '-' as op)::P(g, xs)
| (' ' | '+' | '-' as oop), f, ('*' | '/' as op)::P(g, xs) ->
let h, xs = loop (op, g, xs)
match op with
| '+' -> (+) | '-' -> (-) | '*' -> (*) | '/' | _ -> (/)
|> fun op -> loop (oop, op f h, xs)
| _, f, xs -> f, xs
and (|P|_|) = function
| '('::Expr(f, ')'::xs) -> Some(P(f, xs))
| c::_ as xs when '0' <= c && c <= '9' ->
let rec loop n = function
| c2::xs when '0' <= c2 && c2 <= '9' -> loop (10*n + int(string c2)) xs
| xs -> Some(P(n, xs))
loop 0 xs
| _ -> None
My impression is that even state-of-the-art parser combinators waste a lot of time back tracking. Is that correct? If so, is it possible to write parser combinators that generate state machines to obtain competitive performance or is it necessary to use code generation?
EDIT:
Here's the OCaml script I used to generate a ~2Mb expression for benchmarking:
open Printf
let rec f ff n =
if n=0 then fprintf ff "1" else
fprintf ff "%a+%a*(%a-%a)" f (n-1) f (n-1) f (n-1) f (n-1)
let () =
let n = try int_of_string Sys.argv.(1) with _ -> 3 in
fprintf stdout "%a\n" f n
I'm currently working on the next version of FParsec (v. 0.9), which will in many situations improve performance by up to a factor of 2 relative to the current version.
[Update: FParsec 0.9 has been released, see http://www.quanttec.com/fparsec ]
I've tested Jon's F# parser implementation against two FParsec implementations. The first FParsec parser is a direct translation of djahandarie's parser. The second one uses FParsec's embeddable operator precedence component. As the input I used a string generated with Jon's OCaml script with parameter 10, which gives me an input size of about 2.66MB. All parsers were compiled in release mode and were run on the 32-bit .NET 4 CLR. I only measured the pure parsing time and didn't include startup time or the time needed for constructing the input string (for the FParsec parsers) or the char list (Jon's parser).
I measured the following numbers (updated numbers for v. 0.9 in parens):
In light of these numbers, I'd say that parser combinators can definitely offer competitive performance, at least for this particular problem, especially if you take into account that FParsec
Here's the code for the two FParsec implementations:
Parser #1 (Translation of djahandarie's parser):
open FParsec
let str s = pstring s
let expr, exprRef = createParserForwardedToRef()
let fact = pint32 <|> between (str "(") (str ")") expr
let term = chainl1 fact ((str "*" >>% (*)) <|> (str "/" >>% (/)))
do exprRef:= chainl1 term ((str "+" >>% (+)) <|> (str "-" >>% (-)))
let parse str = run expr str
Parser #2 (Idiomatic FParsec implementation):
open FParsec
let opp = new OperatorPrecedenceParser<_,_,_>()
type Assoc = Associativity
let str s = pstring s
let noWS = preturn () // dummy whitespace parser
opp.AddOperator(InfixOperator("-", noWS, 1, Assoc.Left, (-)))
opp.AddOperator(InfixOperator("+", noWS, 1, Assoc.Left, (+)))
opp.AddOperator(InfixOperator("*", noWS, 2, Assoc.Left, (*)))
opp.AddOperator(InfixOperator("/", noWS, 2, Assoc.Left, (/)))
let expr = opp.ExpressionParser
let term = pint32 <|> between (str "(") (str ")") expr
opp.TermParser <- term
let parse str = run expr str