Search code examples
haskellgenericsocamlgadttype-theory

What does GADT offer that cannot be done with OOP and generics?


Are GADTs in functional languages equivalent to traditional OOP + generics, or there is a scenario where there are correctness constrants easily enforced by GADT but hard or impossible to achieve using Java or C#?

For example, this "well-typed interpreter" Haskell program:

data Expr a where
  N :: Int -> Expr Int
  Suc :: Expr Int -> Expr Int
  IsZero :: Expr Int -> Expr Bool
  Or :: Expr Bool -> Expr Bool -> Expr Bool

eval :: Expr a -> a
eval (N n) = n
eval (Suc e) = 1 + eval e
eval (IsZero e) = 0 == eval e
eval (Or a b) = eval a || eval b

can be written equivalently in Java using generics and appropriate implementation of each subclass, though much more verbose:

interface Expr<T> {
    public <T> T eval();
}

class N extends Expr<Integer> {
    private Integer n;

    public N(Integer m) {
        n = m;
    }

    @Override public Integer eval() {
        return n;
    }
}

class Suc extends Expr<Integer> {
    private Expr<Integer> prev;

    public Suc(Expr<Integer> aprev) {
        prev = aprev;
    }

    @Override public Integer eval() {
        return 1 + prev.eval()
    }
}


/** And so on ... */

Solution

  • OOP classes are open, GADTs are closed (like plain ADTs).

    Here, "open" means you can always add more subclasses later, hence the compiler can not assume to have access to all the subclasses of a given class. (There are a few exceptions, e.g. Java's final which however prevents any subclassing, and Scala's sealed classes). Instead, ADTs are "closed" in the sense you can not add further constructors later on, and the compiler knows that (and can exploit it to check e.g. exhaustiveness). For more information, see the "expression problem".

    Consider the following code:

    data A a where
      A1 :: Char -> A Char
      A2 :: Int  -> A Int
    
    data B b where
      B1 :: Char   -> B Char
      B2 :: String -> B String
    
    foo :: A t -> B t -> Char
    foo (A1 x) (B1 y) = max x y
    

    The above code relies on Char being the only type t for which one can produce both A t and B t. GADTs, being closed, can ensure that. If we tried to mimick this using OOP classes we fail:

    class A1 extends A<Char>   ...
    class A2 extends A<Int>    ...
    class B1 extends B<Char>   ...
    class B2 extends B<String> ...
    
    <T> Char foo(A<T> a, B<T> b) {
          // ??
    }
    

    Here I think we can not implement the same thing unless resorting to unsafe type operations like type casts. (Moreover, these in Java don't even consider the parameter T because of type erasure.) We might think of adding some generic method to A or B to allow this, but this would force us to implement said method for Int and/or String as well.

    In this specific case, one might simply resort to a non generic function:

    Char foo(A<Char> a, B<Char> b) // ...
    

    or, equivalently, to adding a non generic method to those classes. However, the types shared between A and B might be a larger set than the singleton Char. Worse, classes are open, so the set can get larger as soon as one adds a new subclass.

    Also, even if you have a variable of type A<Char> you still do not know if that's a A1 or not, and because of that you can not access A1's fields except by using a type cast. The type cast here would be safe only because the programmer knows there's no other subclass of A<Char>. In the general case, this might be false, e.g.

    data A a where
      A1 :: Char -> A Char
      A2 :: t -> t -> A t
    

    Here A<Char> must be a superclass of both A1 and A2<Char>.


    @gsg asks in a comment about equality witnesses. Consider

    data Teq a b where
       Teq :: Teq t t
    
    foo :: Teq a b -> a -> b
    foo Teq x = x
    
    trans :: Teq a b -> Teq b c -> Teq a c
    trans Teq Teq = Teq
    

    This can be translated as

    interface Teq<A,B> {
      public B foo(A x);
      public <C> Teq<A,C> trans(Teq<B,C> x);
    }
    class Teq1<A> implements Teq<A,A> {
      public A foo(A x) { return x; }
      public <C> Teq<A,C> trans(Teq<A,C> x) { return x; }
    }
    

    The code above declares an interface for all the type pairs A,B, which is then implemented only in the case A=B (implements Teq<A,A>) by the class Teq1. The interface requires a conversion function foo from A to B, and a "transitivity proof" trans, which given this of type Teq<A,B> and an x of type Teq<B,C> can produce an object Teq<A,C>. This is the Java analogous of the Haskell code using GADTs right above.

    The class can not be safely implemented when A/=B, as far as I can see: it would require either returning nulls or cheating with non termination.