types functional-programming language-agnostic theory discriminated-union

Structural typing for sum types

For product types, nominal versus structural typing is a design decision with a straightforward interpretation in each case; you can define two identical record types with the same fields in the same order, but different names; either they are or are not compatible; it's easy to see how each possibility leads to a coherent type system.

It's not clear to me that the same applies to sum types; the whole point of those is that you keep the tag names, so that you can create values and later distinguish them. But I have not been able to find any mention of this problem, in the discussion of nominal versus structural typing.

Is it the case that:

Of course nominal versus structural typing only applies to product types, sum types have to be nominal, this is obvious enough to go without saying, so no one bothers mentioning it.
Actually, sum types can be structural, in the following way that I had not thought of...
Something else?

Solution

From chatting with you about this, what I understand you to be asking is whether a new language can treat equivalent sum types as identical. For example, if the syntax is ML-like, you might define

data Val  = Unparsed String | Parsed Int
data File = Filename String | FileDescriptor Int

You have exactly the same choices here about whether Val and File are considered the same type, convertible types, or unrelated types that you do for product types. Let’s go through some of the options.

Note that the runtime must track which component of the sum type is active, but this does not need to be a type name or type ID (unless the language provides a way to directly query the active type). It could do all type-checking statically, enumerate the possible formats as arbitrary integers, and have the runtime compare to this value (e.g. do a binary search for a case matching 0x02). use it as an index into a function table, or something else.

Structural Typing

One possible, simple implementation would be to duck-type them. It would be strange to write a functional language where you can pass a File to any function that takes a Val and it will just work. But it would just work. The language would look up the definitions, see that they are equivalent, and consider them aliases of each other. It might astonish the programmer a lot less if the language requires the options to have the same names.

Nominal Typing

If you try to do the equivalent in Haskell, it will tell you that these are two different types. To convert one to the other, you would need to write a function that unpacks and repacks, such as

fromVal (Unparsed path) = Filename path
fromVal (Parsed fd)     = FileDescriptor fd

The Mushy Middle

The conversion function I wrote above is clearly suboptimal, because the two types have exactly the same layouts and implementations. You don’t need to do any actual work to convert one to the other.

The language might take a middle ground here: you must explicitly convert between the types, but the conversion is a no-op. A step even further out into ambivalence might be to require a declaration somewhere to enable this trivial conversion, somewhat like a default constructor in C++ or deriving in Haskell. The compiler would be able to write it automatically.

This is common in imperative languages. For example, in C, if two types are “layout-compatible,” or even if they are product types whose first few fields are layout-compatible, type-punning between them is guaranteed to work.

The ubiquitous socket library relies on this to implement struct sockaddr as what is in effect a sum type. As a side-effect, though, if you implemented a new network protocol that had a 32-bit field and a 16-bit field, the language would consider that compatible with an IPv4 address and TCP or UDP port number. Since type-compatibility is structural, there is no way to disable this (or even to get the language to stop you from shooting yourself in the foot, since the way to type-pun is to override all type checking). But kernel-level programming often needs this kind of type-punning.