Reuse and extend the defined type in Ocaml

In Ocaml, is there a simple construct/style to extend a defined type?

Say, if we have the boolean type

bool2 = True | False

Now we want to extend it for 3-valued logic. In Ocaml, is there more elegant one than redefining bool2 like this:

bool3 = True | False | ThirdOne

Solution

I would advise against the immoderate use of polymorphic variants. They look nice on paper, but the more flexible inference and subtyping will come to bite you back at any time. When I use polymorphic variants, I try to make sure that every use is annotated with a precise constraining type expression.

I would suggest going back and modifying your old code, as it seems natural to do. If you have written your code with extensibility in mind, and in particular avoided _-patterns on the bool2 type, then the compiler will warn you of any place where the assumption that there are only two constructors is made. This compiler feedback on type modification is very useful, as it is a mechanical help to make your program correct.

This way of doing things has of course some drawbacks. One of them is that modifying the type definition, then modifying each use case may not work well with your usual compile-test practice: if you do that on a large code base, you will have important amount of things to do before your project compiles cleanly again (and thus can be tested). You may split your modification in several patches to your version control system, but that means some intermediary committed states do no compile, which is not very pleasing. The other possibility is to change those place only to add a run-time failure (| Third_one -> assert false), then you have compilable code and you can correct those failures as they happen at run-time during the testing of the application. But I still think the static compiler feedback is a good help for code maintenance.

There is also the option of wrapping the algebraic datatype in an "extended algebraic datatype" type bool3 = New | Old of bool2, which is discussed in the link you give as a comment of Martin answer. This may be a good way to transition from one datatype to the other without breaking compilation, but on the long term it is painful, especially if you stack more of those extensions on top of each other.

Of course, what would be really needed in some situations would be a way to extend the datatype by code addition, instead of code modification, in a way that is both statically safe, easier to compile, run and test, and efficient at run-time. This is an instance of the Expression Problem, of which they are various solutions, polymorphic variants being one of them. But in the common case, you don't need the additional flexibility of doing code addition only, and it's not worth the additional complexity of the concerned language features, so I would advise to stick with plain old variant types unless it is demonstrably a huge gain to do differently.

PS: regarding polymorphic variants, and their relation to the expression problem, the obligatory paper is Code reuse through polymorphic variants by Jacques Garrigue.