I have a combinator and a result converter that looks like so:
// parses a line like so:
//
// 2
// 00:00:01.610 --> 00:00:02.620 align:start position:0%
//
private def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber ~ whiteSpace).? ~>
time ~ arrow ~ time ~ opt(textLine) ~ eol
} ^^ {
case
startTime ~ _ ~ endTime ~ _ ~ _
=> SubtitleBlock(startTime, endTime, List(""))
}
Because the arrow
, textline
and eol
are not important to my result converter, I was hoping I could use <~
and ~>
in the right places within my combinator such that my converter doesn't have to deal with them. As an experiment, I changed the first ~
in the parser to <~ and removed the ~ _
where the "arrow" would be matched in the case
statement like so:
private def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber ~ whiteSpace).? ~>
time <~ arrow ~ time ~ opt(textLine) ~ eol
} ^^ {
case
startTime ~ endTime ~ _ ~ _
=> SubtitleBlock(startTime, endTime, List(""))
}
However, I get red-squigglies in IntelliJ with the error message:
Error:(44, 31) constructor cannot be instantiated to expected type; found : caption.vttdissector.VttParsers.~[a,b] required: Int startTime ~ endTime ~ _ ~ _
What am I doing wrong?
Since you didn't insert any parentheses in the chain of ~
and <~
, most matched subexpressions are thrown out "with the bathwater" (or rather "with the whitespace and arrows"). Just insert some parentheses.
Here is the general pattern what it should look like:
(irrelevant ~> irrelevant ~> RELEVANT <~ irrelevant <~ irrelevant) ~
(irrelevant ~> RELEVANT <~ irrelevant <~ irrelevant) ~
...
i.e. every "relevant" subexpression is surrounded by irrelevant stuff and a pair of parentheses, and then the parenthesized subexpressions are connected by ~
's.
Your example:
import scala.util.parsing.combinator._
import scala.util.{Either, Left, Right}
case class SubtitleBlock(startTime: String, endTime: String, text: List[String])
object YourParser extends RegexParsers {
def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber.? ~> time <~ arrow) ~
time ~
(opt(textLine) <~ eol)
} ^^ {
case startTime ~ endTime ~ _ => SubtitleBlock(startTime, endTime, Nil)
}
override val whiteSpace = "[ \t]+".r
def arrow: Parser[String] = "-->".r
def subtitleNumber: Parser[String] = "\\d+".r
def time: Parser[String] = "\\d{2}:\\d{2}:\\d{2}.\\d{3}".r
def textLine: Parser[String] = ".*".r
def eol: Parser[String] = "\n".r
def parseStuff(s: String): scala.util.Either[String, SubtitleBlock] =
parseAll(subtitleHeader, s) match {
case Success(t, _) => scala.util.Right(t)
case f => scala.util.Left(f.toString)
}
def main(args: Array[String]): Unit = {
val examples: List[String] = List(
"2 00:00:01.610 --> 00:00:02.620 align:start position:0%\n"
) ++ args.map(_ + "\n")
for (x <- examples) {
println(parseStuff(x))
}
}
}
finds:
Right(SubtitleBlock(00:00:01.610,00:00:02.620,List()))