Search code examples
scalaparser-combinators

How to specify a range of repetitions in Scala's parser combinators?


Scala's parser combinators allow to specify some repetitions as 0 or more (.*), 1 or more (.+), etc. Is it possible to specify a range? For example, rep(p, q)(parser) where parser is run at least p times and up to q times.


Solution

  • The Combinator Operations in Scala's Parser Combinators are roughly modeled after Regular Expressions. Regular Expressions only have three Combinator Operations:

    Given two Regular Expressions R and S,

    • RS is a Regular Expression (Concatenation)
    • R | S is a Regular Expression (Alternation)
    • R* is a Regular Expression (Kleene Star)

    That's it. Scala adds a couple more, most notably + and ? but that doesn't actually increase the power since R+ is actually just RR* and R? is just R | ε. The same applies to your proposed rep combinator. It does not actually increase the power since

    R{m, n}
    

    is actually just

    RRRRRRRRRRR(ε | R | RR | RRR | RRRR | RRRRR | RRRRRR | RRRRRRR | RRRRRRRR | RRRRRRRRR)
     ↑↑ m×R ↑↑                                                                ↑↑ (m-n)×R ↑↑
    

    Now, of course, while this doesn't increase the power of the Parsers, it does increase the expressivity and thus readability and maintainability.

    It would be pretty easy to build it on top of | and repN, I believe.

    Something like:

    /** A parser generator for a number of repetitions within a range.
     *
     *  `repMN(m, n, p)` uses `p` between `m` and `n` time to parse the input
     *  (the result is a `List` of the consecutive results of `p`).
     *
     * @param p   a `Parser` that is to be applied successively to the input
     * @param min the minimum number of times `p` must succeed
     * @param max the maximum number of times `p` must succeed
     * @return    A parser that returns a list of results produced by repeatedly applying `p` to the input
     *        (and that only succeeds if `p` matches between `m` and `n` times).
     */
    def repMN[T](min: Int, max: Int, p: ⇒ Parser[T]) = 
      (min to max).reverse map { repN(_, p) } reduce {_ | _}
    

    This looks useful enough that it might even make sense to file as an enhancement request.