scala pattern-matching union-types scala-3 dotty

Pattern matching against union type can't remove cases from consideration

Suppose I have a type which is either a string or a tuple of strings.

type OneOrTwo = String | (String, String)

Now I want to discriminate between these two types. The runtime representations are distinct (String vs. Tuple), so I should be able to do this.

def getFirst(x: OneOrTwo): String =
  x match {
    // Tuple case
    case (a, _) => a
    // String case
    case y => y
  }

Strangely, this doesn't work. The tuple pattern is irrefutable (for tuples), so if that pattern fails, then in the second case y should be of type String, based on my understanding. But Scala reports that y is of type OneOrTwo and can't be returned from a function expecting a String.

The strange thing is that Scala seems to understand that this pattern is irrefutable. If we add a : String annotation

def getFirst(x: OneOrTwo): String =
  x match {
    // Tuple case
    case (a, _) => a
    // String case
    case y: String => y
  }

Then the code compiles successfully and we don't get a non-exhaustiveness warning, since Scala knows that every tuple will match Case 1 and every string will match Case 2.

So why do we need this annotation? Why can't Scala perform that final step of reasoning that "if we don't have a tuple and the type is String | (String, String), then it must be a string"?

Solution

I have to disagree on this, implying that the compiler should do yet another invisible type cast (as if whole type erasure process is not enough), could quickly lead to more unintuitive type-checking issues later and an even lesser transparency from the compiler's side.

Union types being implemented with generics, they'll get erased so you'll get an Object for both: union types are erased at compile-time up to their least upper bound (LUB), which should be Object, since the 2 types have nothing in common, except being Serializable, but according to the linearization rule should choose the class over the trait.

So making it the programmer's duty to handle explicit type casting in the pattern match gives you more flexibility and power at your disposal, and makes the code much more easier to reason for the eyes of your fellow teammates, and yours when you will come back after a while on the same code and forgot what that union had to result in the second case.

Indeed, in the example, the compiler could do that fairly easy. It sort of does it because it shows you the warning. But that is one particular case where it would actually work. The benefit does not outgrow the newly created issue: I would still have to double check the type member to see what is the other type in the union, just to figure out what the pattern match's type should be on the second case.

Note the rules were precisely enforced on union types:

If the selector of a pattern match is a union type, the match is considered exhaustive if all parts of the union are covered.

The motivation seems to be the same as when inferring the result type of a definition (val, var, or def) and the type we are about to infer is a union type, then we replace it by its join, because inferring types which are "too precise" can lead to unintuitive typechecking issues later on.