ocaml gadt subtyping polymorphic-variants

Polymorphic variant --> GADT?

I'm writing a Prolog system and I am using polymorphic variants to represent Prolog terms.

In particular, I use polymorphic variants (instead of regular ones) so I can do subtyping while ensuring OCaml's excellent exhaustiveness checks of matches on subtypes.

So far, so good!

On multiple occasions I have been reading about GADTs (on discuss.ocaml.org, and in realworldocaml.org). To me, GADTs appear to offer similar features, but with a smaller memory footprint: polymorphic variants with cases having more than one argument require an extra pointer that regular variants do not need.

So far, I have not been able to use GADTs successfully, so here's my question:

Is there a simple, straightforward way to convert code using polymorphic variants to GADTs? It this even possible in the general case?

Thank you in advance!

Solution

Is there a simple, straightforward way to convert code using polymorphic variants to GADTs? Is this even possible in the general case?

No, because they, in general, serve completely different purposes.

Polymorphic variants provide subtyping for sum types. GADT enable constraints for sum type variants.

You can, however, use both techniques to partition a sum type into a quotient set of types that comprise your super type. You can even use phantom polymorphic variants as the witness type for each subset.

Let's do some coding, imagine we have a set of figures, which we would like to subdivide into two non-intersecting subsets of circles and rectangles.

Using the polymorphic variants,

  type circle = [ `circle of int]
  type rectangle = [`rectangle of int * int]
  type figure = [circle | rectangle]

and the same using GADT

  type circle = Is_circle
  type rectangle = Is_rectangle
  type _ figure =
    | Circle : int -> circle figure
    | Rectangle : int * int -> rectangle figure

Note, how the constraint is explicitly expressed with circle and rectangle type. We could even use polymorphic variants to witness the constraint. Anything will work, as long as the type checker could distinguish the two types in the constraint (i.e., abstract types won't work as their implementation might be equal).

Now let's make some operations that involve subtyping. Let's start with polymorphic variants

  let equal_circle (`circle r1) (`circle r2) = r1 = r2

  let equal_rectangle (`rectangle (x1,y1)) (`rectangle (x2,y2)) =
    x1 = x2 && y1 = y2

  let equal_figure x y = match x,y with
    | (#circle as x), (#circle as y) ->
      equal_circle x y
    | (#rectangle as x), (#rectangle as y) ->
      equal_rectangle x y
    | #rectangle, #circle
    | #circle, #rectangle -> false

So far so good. A small caveat is that our equal_circle and equal_rectangle functions are a bit too polymorphic, but that could be easily solved by adding a proper type constraint or having a module signature (we are always using module signatures, right?).

Now let's implement the same with GADT, slowly,

  let equal_circle (Circle x) (Circle y) = x = y

  let equal_rectangle (Rectangle (x1,y1)) (Rectangle (x2,y2)) =
    x1 = x2 && y1 = y2

Looks the same as the poly example, modulo small syntactic differences. The type is also looking nice and precise, val equal_circle : circle figure -> circle figure -> bool. No need for extra constraints, the type checker is doing our work for us.

OK, now let's try to write the super method, the first attempt will not work:

  (* not accepted by the typechecker *)
  let equal_figure x y = match x,y with
    | (Circle _ as x), (Circle _ as y) ->
      equal_circle x y
    | (Rectangle _ as x), (Rectangle _ as y) ->
      equal_rectangle x y

With GADT the type checker by default will pick a concrete type index, so instead of ascribing the 'a figure -> 'b figure -> bool type, the type checker will select arbitrary an index, in our case, it is circle and complain that rectangle figure is not a circle figure. No problem, we can explicitly say that we want to allow the comparison of arbitrary figures,

  let equal_figure : type k. k figure -> k figure -> bool =
    fun x y -> match x,y with
      | (Circle _ as x), (Circle _ as y) ->
        equal_circle x y
      | (Rectangle _ as x), (Rectangle _ as y) ->
        equal_rectangle x y

This type k says to the type checker: "generalize all occurrences of k". So we now have a method that has a slightly different typing than the corresponding method implemented with polymorphic variants. It can compare figures of an equal kind but not figures of different kinds, i.e., it has type 'a figure -> 'a figure -> bool. The thing that you can't express with polymorphic variants, indeed, the same implementation with polyvariants is non-exhaustive,

  let equal_figure x y = match x,y with (* marked as non-exhaustive *)
    | (#circle as x), (#circle as y) ->
      equal_circle x y
    | (#rectangle as x), (#rectangle as y) ->
      equal_rectangle x y

We can, of course, implement a more generic method that enables comparison of arbitrary figures using GADT, e.g., the following definition has type 'a figure -> 'b figure -> bool

  let equal_figure' : type k1 k2. k1 figure -> k2 figure -> bool =
    fun x y -> match x,y with
      | (Circle _ as x), (Circle _ as y) ->
        equal_circle x y
      | (Rectangle _ as x), (Rectangle _ as y) ->
        equal_rectangle x y
      | Rectangle _, Circle _
      | Circle _, Rectangle _ -> false

We immediately see that for our purposes GADT is a more powerful tool that gives us more control on the constraints. And given that we can use polymorphic variants and object types as type indices for expressing constraints, we can have the best from the two worlds - fine-grained constraints and subtyping.

To be honest, you can achieve the same result as with GADT but without GADT, using the tagless-final style. But this is an implementation detail, which sometimes is important in practice. The main issue with GADT is that they are not serializable. Indeed, you can't store the phantom type.

To conclude, no matter if you're using GADT or Tagless-Final Style, you have much more control over the type constraints and can express your intent more clearly (and let the type checker enforce it). We are using it a lot in BAP to express the complex constraints off well-formed programs, e.g., that bitvector operations are applied to the vectors of the same length. This enables us to disregard ill-formed programs in our analysis and saves us a few lines of code and a few hours of debugging.

The answer, even with that simple example, has already grown too big so I have to stop. I would suggest you visiting discuss.ocaml.org and search for GADT and Polymorphic Variants there. I remember that there were a few more thorough discussions there.

Some deeper discussions from dicuss.ocaml.org