I'm using Scala extractors (i.e.: Regex inside in a pattern mathing) in order to identify doubles and longs, like shown below.
My question is: why Regex is apparently failing when employed in a pattern matching whilst it clearly delivers the expected results when employed in a chain of if/then/else expressions?
val LONG = """^(0|-?[1-9][0-9]*)$"""
val DOUBLE = """NaN|^-?(0(\.[0-9]*)?|([1-9][0-9]*\.[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?$"""
val scalaLONG : scala.util.matching.Regex = LONG.r
val scalaDOUBLE : scala.util.matching.Regex = DOUBLE.r
val types1 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
text match {
case scalaLONG(long) => s"Long"
case scalaDOUBLE(double) => s"Double"
case _ => s"String"
})
// Results types1: Seq[String] = List("String", "Long", "String", "String", "String")
val types2 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
if(scalaDOUBLE.findFirstIn(text).isDefined) "Double" else
if(scalaLONG .findFirstIn(text).isDefined) "Long" else
"String")
// Results types2: Seq[String] = List("String", "Long", "Double", "Double", "Double")
As you can see from above, types2
delivers the expected results whilst types1
tells "String" when "Double" is expected, apparently pointing out to a failure in the Regex processing.
EDIT: With help from @alex-savitsky and @leo-c, I've arrived to the following shown below, which works as expected. However, I have to remember to provide an empty argument list in the pattern matching, otherwise it gives wrong results. This looks error prone to me.
val LONG = """^(?:0|-?[1-9][0-9]*)$"""
val DOUBLE = """^NaN|-?(?:0(?:\.[0-9]*)?|(?:[1-9][0-9]*\.[0-9]*)|(?:\.[0-9]+))(?:[Ee][+-]?[0-9]+)?$"""
val scalaLONG : scala.util.matching.Regex = LONG.r
val scalaDOUBLE : scala.util.matching.Regex = DOUBLE.r
val types1 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
text match {
case scalaLONG() => s"Long"
case scalaDOUBLE() => s"Double"
case _ => s"String"
})
// Results types1: Seq[String] = List("String", "Long", "Double", "Double", "Double")
val types2 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
if(scalaDOUBLE.findFirstIn(text).isDefined) "Double" else
if(scalaLONG .findFirstIn(text).isDefined) "Long" else
"String")
// Results types2: Seq[String] = List("String", "Long", "Double", "Double", "Double")
EDIT: OK... despite error prone... it is an extractor pattern, which employs unapply
behind the scenes and, in this case, we have to pass arguments to unnapply
. @alex-savitsky is using _*
in his edit, which explicitly enforces intention of dropping all capture groups. Looks good to me.
match
matches the whole input, while findFirstIn
can match partial input contents, sometimes resulting in more matches. In fact, findFirstIn
will ignore your boundary markings ^$
outright.
If your intention was to match the whole input, put your ^
at the beginning of the regex, as in val DOUBLE = """^NaN|-?(0(\.[0-9]*)?|([1-9][0-9]*\.[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?$"""
, then the types1
would match the types correctly.
EDIT: Here's my test case for your question
object Test extends App {
val regex = """^NaN|-?(?:0(?:\.[0-9]*)?|(?:[1-9][0-9]*\.[0-9]*)|(?:\.[0-9]+))(?:[Ee][+-]?[0-9]+)?$""".r
println(Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map {
case regex() => "Double"
case _ => "String"
})
}
results in List(String, String, Double, Double, Double)
As you see, the non-capturing groups make all the difference.
If you still want to use capturing groups, you can use _*
to ignore the capture result:
object Test extends App {
val regex = """^NaN|-?(0(\.[0-9]*)?|([1-9][0-9]*\.[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?$""".r
println(Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map {
case regex(_*) => "Double"
case _ => "String"
})
}