Search code examples
scalaparsingparser-combinators

SCALA: How to convert a Parser Combinator result to Scala List[String]?


I am trying to write a parser for a language very similar to Milner's CCS. Basically what I am parsing so far are expressions of the following sort:

  • a.b.a.1
  • a.0

An expression must start with a letter (excluding t) and could have any number of letters following the first letter (separated by a '.'). The Expression must terminate with a digit (for simplicity I chose digits between 0 and 2 for now). I want to use Parser Combinators for Scala, however this is the first time that I am working with them. This is what I have so far:

import scala.util.parsing.combinator._

class SimpleParser extends RegexParsers {
  def alpha: Parser[String] = """[^t]{1}""".r ^^ { _.toString }
  def digit: Parser[Int] = """[0-2]{1}""".r ^^ { _.toInt }

  def expr: Parser[Any] = alpha ~ "." ~ digit ^^ {
    case al ~ "." ~ di => List(al, di)
  }

  def simpleExpression: Parser[Any] = alpha ~ "." ~ rep(alpha ~ ".") ~ digit //^^ {  }
}

As you can see in def expr :Parser[Any] I am trying to return the result as a list, since Lists in Scala are very easy to work with (in my opinion). Is this the correct way how to convert a Parser[Any] result to a List? Can anyone give me any tips on how I can do this for def simpleExpression:Parser[Any].

The main reason why I want to use Lists is because after parsing and Expression I want to be able to consume it. For example, given the expression a.b.1, if I am given an 'a', I would like to consume the expression to end up with a new expression: b.1 (i.e. a.b.1 ->(a)-> b.1). The idea behind this is to simulate finite state automatas. Any tips on how I may improve my implementation are appreciated.


Solution

  • To keeps things type safe, I recommend a parser that produces a tuple of a list of strings and an int. That is, the input a.b.a.1 would get parsed as (List("a", "b", "a"), 1). Note also that the regex for alpha was modified to exclude anything that is not a lowercase letter (in addition to t).

    class SimpleParser extends RegexParsers {
      def alpha: Parser[String] = """[a-su-z]{1}""".r ^^ { _.toString }
      def digit: Parser[Int] = """[0-2]{1}""".r ^^ { _.toInt }
    
      def repAlpha: Parser[List[String]] = rep1sep(alpha, ".")
    
      def expr: Parser[(List[String], Int)] = repAlpha ~ "." ~ digit ^^ {
        case alphas ~ _ ~ num =>
          (alphas, num)
      }
    }
    

    With an instance of this SimpleParser, here's the output I got:

    println(parser.parse(parser.expr, "a.b.a.1"))
    // [1.8] parsed: (List(a, b, a),1)
    
    println(parser.parse(parser.expr, "a.0"))
    // [1.4] parsed: (List(a),0)