User-defined tuple-based data constructors

Me trying to grok The Little MLer again. TLMLer has this SML code

datatype a pizza = Bottom | Topping of (a * (a pizza))
datatype fish = Anchovy | Lox | Tuna

which I've translated as

data PizzaSh a = CrustSh | ToppingSh a (PizzaSh a)
data FishPSh = AnchovyPSh | LoxPSh | TunaPSh

and then an alternative closer to TLMLer perhaps

data PizzaSh2 a = CrustSh2 | ToppingSh2 (a, PizzaSh2 a)

And from each I create a pizza

fpizza1 = ToppingSh AnchovyPSh (ToppingSh TunaPSh (ToppingSh LoxPSh CrustSh))
fpizza2 = ToppingSh2 (AnchovyPSh, ToppingSh2 (LoxPSh, ToppingSh2 (TunaPSh, CrustSh2)))

respectively, which are of type PizzaSh FishPSh and PizzaSh2 FishPSh respectively.

But the second version (which is arguably the closer to the original ML version) seems "offbeat." It's as if I'm creating a 2-tuple when I "cons" toppings together where the second member recursively expands . May I assume the parametric data constructor "function" of PizzaSh2 doesn't literally build a tuple, it's just borrowing the tuple as a cons strategy, correct? Which is preferable in Haskell, PizzaSh or PizzaSh2? As I understand, a tuple (cartesian product) data type will have a single constructor, e.g., data Point a b = Pt a b, not the disjoint union of ored-together (|) constructors. In SML the "*" indicates product, i.e., tuple, but, again, is this just a "tuple-like thing," i.e., it's just a tuple-looking way to cons a pizza together?

Solution

In Haskell, we prefer this style:

data PizzaSh a = CrustSh | ToppingSh a (PizzaSh a)

There is no need in Haskell to use a tuple there, since a data constructor like ToppingSh can take multiple arguments.

Using an additional pair as in

data PizzaSh2 a = CrustSh2 | ToppingSh2 (a, PizzaSh2 a)

creates a type which is almost isomorphic to the previous one, but is more cumbersome to handle since it requires to use more parentheses. E.g.

foo (ToppingSh x y)
-- vs
foo (ToppingSh2 (x, y))

bar :: PizzaSh a -> ...
bar (ToppingSh x y) = ....
-- vs
bar (ToppingSh2 (x, y)) = ...

Further, the type is indeed only almost isomorphic. When using an additional pair, because of laziness, we have one more value which can be represented in the type: we have a correpondence

ToppingSh x y     <->    ToppingSh2 (x, y)

which breaks down in the case

???               <->    ToppingSh2 undefined

That is, ToppinggSh2 can be applied to a non-terminating (or otherwise exceptional), pair-valued expression, and that constructs a value which can not be represented using ToppingSh.

Operationally, to achieve that GHC uses a double indirection (roughly pointer-to-pointer, or thunk-returning-pair-of-thunks), which further slows down the code. Hence, it's also a bad choice from a performance point of view, if one cares about such micro-optimizations.