Me trying to grok The Little MLer again. TLMLer has this SML code
datatype a pizza = Bottom | Topping of (a * (a pizza))
datatype fish = Anchovy | Lox | Tuna
which I've translated as
data PizzaSh a = CrustSh | ToppingSh a (PizzaSh a)
data FishPSh = AnchovyPSh | LoxPSh | TunaPSh
and then an alternative closer to TLMLer perhaps
data PizzaSh2 a = CrustSh2 | ToppingSh2 (a, PizzaSh2 a)
And from each I create a pizza
fpizza1 = ToppingSh AnchovyPSh (ToppingSh TunaPSh (ToppingSh LoxPSh CrustSh))
fpizza2 = ToppingSh2 (AnchovyPSh, ToppingSh2 (LoxPSh, ToppingSh2 (TunaPSh, CrustSh2)))
respectively, which are of type PizzaSh FishPSh
and PizzaSh2 FishPSh
respectively.
But the second version (which is arguably the closer to the original ML version) seems "offbeat." It's as if I'm creating a 2-tuple when I "cons" toppings together where the second member recursively expands . May I assume the parametric data constructor "function" of PizzaSh2
doesn't literally build a tuple, it's just borrowing the tuple as a cons strategy, correct? Which is preferable in Haskell, PizzaSh
or PizzaSh2
? As I understand, a tuple (cartesian product) data type will have a single constructor, e.g., data Point a b = Pt a b
, not the disjoint union of ored-together (|) constructors. In SML the "*" indicates product, i.e., tuple, but, again, is this just a "tuple-like thing," i.e., it's just a tuple-looking way to cons a pizza together?
In Haskell, we prefer this style:
data PizzaSh a = CrustSh | ToppingSh a (PizzaSh a)
There is no need in Haskell to use a tuple there, since a data constructor like ToppingSh
can take multiple arguments.
Using an additional pair as in
data PizzaSh2 a = CrustSh2 | ToppingSh2 (a, PizzaSh2 a)
creates a type which is almost isomorphic to the previous one, but is more cumbersome to handle since it requires to use more parentheses. E.g.
foo (ToppingSh x y)
-- vs
foo (ToppingSh2 (x, y))
bar :: PizzaSh a -> ...
bar (ToppingSh x y) = ....
-- vs
bar (ToppingSh2 (x, y)) = ...
Further, the type is indeed only almost isomorphic. When using an additional pair, because of laziness, we have one more value which can be represented in the type: we have a correpondence
ToppingSh x y <-> ToppingSh2 (x, y)
which breaks down in the case
??? <-> ToppingSh2 undefined
That is, ToppinggSh2
can be applied to a non-terminating (or otherwise exceptional), pair-valued expression, and that constructs a value which can not be represented using ToppingSh
.
Operationally, to achieve that GHC uses a double indirection (roughly pointer-to-pointer, or thunk-returning-pair-of-thunks), which further slows down the code. Hence, it's also a bad choice from a performance point of view, if one cares about such micro-optimizations.