Search code examples
scalaparser-combinators

Using keep-left/right combinator is not working with result converter


I have a combinator and a result converter that looks like so:

// parses a line like so:
// 
// 2
// 00:00:01.610 --> 00:00:02.620 align:start position:0%
//
private def subtitleHeader: Parser[SubtitleBlock] = {
  (subtitleNumber ~ whiteSpace).? ~>
    time ~ arrow ~ time ~ opt(textLine) ~ eol
} ^^ {
  case
    startTime ~ _ ~ endTime ~ _ ~ _
  => SubtitleBlock(startTime, endTime, List(""))
}

Because the arrow, textline and eol are not important to my result converter, I was hoping I could use <~ and ~> in the right places within my combinator such that my converter doesn't have to deal with them. As an experiment, I changed the first ~ in the parser to <~ and removed the ~ _ where the "arrow" would be matched in the case statement like so:

private def subtitleHeader: Parser[SubtitleBlock] = {
  (subtitleNumber ~ whiteSpace).? ~>
    time <~ arrow ~ time ~ opt(textLine) ~ eol
} ^^ {
  case
    startTime ~ endTime ~ _ ~ _
  => SubtitleBlock(startTime, endTime, List(""))
}

However, I get red-squigglies in IntelliJ with the error message:

Error:(44, 31) constructor cannot be instantiated to expected type; found : caption.vttdissector.VttParsers.~[a,b] required: Int startTime ~ endTime ~ _ ~ _

What am I doing wrong?


Solution

  • Since you didn't insert any parentheses in the chain of ~ and <~, most matched subexpressions are thrown out "with the bathwater" (or rather "with the whitespace and arrows"). Just insert some parentheses.

    Here is the general pattern what it should look like:

    (irrelevant ~> irrelevant ~> RELEVANT <~ irrelevant <~ irrelevant) ~
    (irrelevant ~> RELEVANT <~ irrelevant <~ irrelevant) ~ 
    ...
    

    i.e. every "relevant" subexpression is surrounded by irrelevant stuff and a pair of parentheses, and then the parenthesized subexpressions are connected by ~'s.

    Your example:

    import scala.util.parsing.combinator._
    import scala.util.{Either, Left, Right}
    
    case class SubtitleBlock(startTime: String, endTime: String, text: List[String])
    
    object YourParser extends RegexParsers {
    
      def subtitleHeader: Parser[SubtitleBlock] = {
        (subtitleNumber.? ~> time <~ arrow) ~ 
        time ~
        (opt(textLine) <~ eol)
      } ^^ {
        case startTime ~ endTime ~ _ => SubtitleBlock(startTime, endTime, Nil)
      }
    
      override val whiteSpace = "[ \t]+".r
      def arrow: Parser[String] = "-->".r
      def subtitleNumber: Parser[String] = "\\d+".r
      def time: Parser[String] = "\\d{2}:\\d{2}:\\d{2}.\\d{3}".r
      def textLine: Parser[String] = ".*".r
      def eol: Parser[String] = "\n".r
    
      def parseStuff(s: String): scala.util.Either[String, SubtitleBlock] = 
      parseAll(subtitleHeader, s) match {
        case Success(t, _) => scala.util.Right(t)
        case f => scala.util.Left(f.toString)
      } 
    
      def main(args: Array[String]): Unit = {
        val examples: List[String] = List(
          "2 00:00:01.610 --> 00:00:02.620 align:start position:0%\n"
        ) ++ args.map(_ + "\n")
    
        for (x <- examples) {
          println(parseStuff(x))
        }
      }
    }
    

    finds:

    Right(SubtitleBlock(00:00:01.610,00:00:02.620,List()))