Search code examples
fparsec

Why does FParsec use lists?


I thought I'd try writing a fast parser using FParsec and quickly realised that many returning a list is a seriously performance problem. Then I discovered an alternative that uses a ResizeArray in the docs:

let manyA2 p1 p =
    Inline.Many(firstElementParser = p1,
                elementParser = p,
                stateFromFirstElement = (fun x0 ->
                                             let ra = ResizeArray<_>()
                                             ra.Add(x0)
                                             ra),
                foldState = (fun ra x -> ra.Add(x); ra),
                resultFromState = (fun ra -> ra.ToArray()),
                resultForEmptySequence = (fun () -> [||]))

let manyA p = manyA2 p p

Using this in my code instead makes it run several times faster. So why does FParsec use lists by default instead of ResizeArray?


Solution

  • Using the built-in F# list type as the result type for the sequence combinators makes the combinators more convenient to use in F# and arguably leads to more idiomatic client-side code. Since most F# developers value simplicity and elegance over performance (at least in my experience) using lists as the default seemed like the right choice when I designed the API. At the same time I tried to make it easy for users to define their own specialized sequence combinators.

    Currently the sequence combinators that return a list also use a list internally for building up the sequence. This is suboptimal for sequences with more than about 2 elements, as the list has to be reversed before it is returned. However, I'm not sure whether changing the implementation would be worth the effort, since if your parser is performance sensitive and you're parsing long sequence you're better off not using lists at all.

    I probably should add a section about using arrays instead of lists to the user's guide chapter on performance.