For some simple ADTs, we can obtain the ADT of sets of that ADT as data Set = Full Set (*repeated N times) | Empty Set(*repeated N times)
, where N is the number of non terminal constructors of that ADT. For a more solid example, take Nat
:
data Nat = Succ Nat | Zero
We can obtain a type for sets of nats as follows:
data NatSet = Full NatSet | Empty NatSet
So, for example,
empty :: NatSet
empty = Empty empty
insert :: Nat -> NatSet -> NatSet
insert Zero (Empty natverse) = Full natverse
insert Zero (Full natverse) = Full natverse
insert (Succ nat) (Empty natverse) = Empty (insert nat natverse)
insert (Succ nat) (Full natverse) = Full (insert nat natverse)
member :: Nat -> NatSet -> Bool
member Zero (Full natverse) = True
member Zero (Empty natverse) = False
member (Succ nat) (Empty natverse) = member nat natverse
member (Succ nat) (Full natverse) = member nat natverse
main = do
let set = foldr insert empty [Zero, Succ (Succ Zero), Succ (Succ (Succ (Succ (Succ Zero))))]
print $ member Zero set
print $ member (Succ Zero) set
print $ member (Succ (Succ Zero)) set
print $ member (Succ (Succ (Succ Zero))) set
print $ member (Succ (Succ (Succ (Succ Zero)))) set
print $ member (Succ (Succ (Succ (Succ (Succ Zero))))) set
For other types, it is equally simple:
data Bin = A Bin | B Bin | C
data BinSet = Empty BinSet BinSet | Full BinSet BinSet
But what about types that branch? It isn't obvious to me:
data Tree = Branch Tree Tree | Tip
data TreeSet = ???
Is there any type-algebraic argument that maps ADTs to the ADTs of sets of that type?
Let's take another look at your set type.
data NatSet = Full NatSet | Empty NatSet
There's always another NatSet
inside this one; the two branches of the sum type indicate whether the current Nat
is present in the set. This is a structural representation of sets. As you observed, the structure of the set depends on the structure of the values.
It's equivalent to a stream of Booleans: we test for membership by indexing into the stream.
data NatSet = NatSet Bool NatSet
empty = NatSet False empty
insert Z (NatSet _ bs) = NatSet True bs
insert (S n) (NatSet b bs) = NatSet b (insert n bs)
(NatSet b _) `contains` Z = b
(NatSet _ bs) `contains` (S n) = bs `contains` n
Based on the insight that set membership is like indexing into a collection of Booleans, let's pursue a generic implementation of sets of values of types formed as the fixed point of a polynomial functor (what's the problem?).
First things first. As you observed, the set's structure depends on the type of the things inside it. Here's the class of things which can be elements of a set,
class El a where
data Set a
empty :: Set a
full :: Set a
insert :: a -> Set a -> Set a
remove :: a -> Set a -> Set a
contains :: Set a -> a -> Bool
For a first example, I'll modify NatSet
from above to suit this format.
instance El Nat where
data Set Nat = NatSet Bool (Set Nat)
empty = NatSet False empty
full = NatSet True empty
insert Z (NatSet _ bs) = NatSet True bs
insert (S n) (NatSet b bs) = NatSet b (insert n bs)
remove Z (NatSet _ bs) = NatSet False bs
remove (S n) (NatSet b bs) = NatSet b (remove n bs)
(NatSet b _) `contains` Z = b
(NatSet _ bs) `contains` (S n) = bs `contains` n
Another simple El
instance which we'll need later is ()
. A set of ()
s is either full or empty, with nothing in between.
instance El () where
newtype Set () = UnitSet Bool
empty = UnitSet False
full = UnitSet True
insert () _ = UnitSet True
remove () _ = UnitSet False
(UnitSet b) `contains` () = b
The fixed point of a functor f
,
newtype Fix f = Fix (f (Fix f))
in an instance of El
whenever f
is an instance of the following El1
class.
class El1 f where
data Set1 f a
empty1 :: El a => Set1 f (Set a)
full1 :: El a => Set1 f (Set a)
insert1 :: El a => f a -> Set1 f (Set a) -> Set1 f (Set a)
remove1 :: El a => f a -> Set1 f (Set a) -> Set1 f (Set a)
contains1 :: El a => Set1 f (Set a) -> f a -> Bool
instance El1 f => El (Fix f) where
newtype Set (Fix f) = FixSet (Set1 f (Set (Fix f)))
empty = FixSet empty1
full = FixSet full1
insert (Fix x) (FixSet s) = FixSet (insert1 x s)
remove (Fix x) (FixSet s) = FixSet (remove1 x s)
(FixSet s) `contains` (Fix x) = s `contains1` x
As usual we'll be composing bits of functors into bigger functors, before taking the resulting functor's fixed point to produce a concrete type.
newtype I a = I a
newtype K c a = K c
data (f :* g) a = f a :* g a
data (f :+ g) a = L (f a) | R (g a)
type N = K ()
For example,
type Nat = Fix (N :+ N)
type Tree = Fix (N :+ (I :* I))
Let's make sets of these things.
I
(for identity) is the location in the polynomial where Fix
will plug in the recursive substructure. We can just delegate its implementation of El_
to Fix
's.
instance El1 I where
newtype Set1 I a = ISet1 a
empty1 = ISet1 empty
full1 = ISet1 full
insert1 (I x) (ISet1 s) = ISet1 (insert x s)
remove1 (I x) (ISet1 s) = ISet1 (remove x s)
(ISet1 s) `contains1` (I x) = s `contains` x
The constant functor K c
features no recursive substructure, but it does feature a value of type c
. If c
can be put in a set, K c
can be too.
instance El c => El1 (K c) where
newtype Set1 (K c) a = KSet1 (Set c)
empty1 = KSet1 empty
full1 = KSet_ full
insert1 (K x) (KSet1 s) = KSet1 (insert x s)
remove1 (K x) (KSet1 s) = KSet1 (remove x s)
(KSet1 s) `contains1` (K x) = s `contains` x
Note that this definition makes Set1 N
isomorphic to Bool
.
For sums, let's use our intuition that testing for membership is like indexing. When you index into a tuple, you choose between the left-hand and right-hand members of the tuple.
instance (El1 f, El1 g) => El1 (f :+ g) where
data Set1 (f :+ g) a = SumSet1 (Set1 f a) (Set1 g a)
empty1 = SumSet1 empty1 empty1
full1 = SumSet1 full1 full1
insert1 (L x) (SumSet1 l r) = SumSet1 (insert1 x l) r
insert1 (R y) (SumSet1 l r) = SumSet1 l (insert1 y r)
remove1 (L x) (SumSet1 l r) = SumSet1 (remove1 x l) r
remove1 (R y) (SumSet1 l r) = SumSet1 l (remove1 y r)
(SumSet1 l r) `contains1` (L x) = l `contains1` x
(SumSet1 l r) `contains1` (R y) = r `contains1` y
This is now enough to do sets of natural numbers. With appropriate instances of Show
, you can do this:
ghci> let z = Fix (L (K ()))
ghci> let s n = Fix (R (I n))
ghci> insert (s z) empty `contains` z
False
ghci> insert (s z) empty `contains` (s z)
True
ghci> empty :: Set (Fix (N :+ I))
FixSet (SumSet1 (KSet1 (UnitSet False)) (ISet1 (FixSet ( -- to infinity and beyond
I still haven't answered your original question, which was how should this work for product types? I can think of a few strategies, but none of them actually work.
We could make Set1 (f :* g) a
a sum type. This has a pleasing symmetry: sets of sums are products, and sets of products are sums. In the context of the indexing idea, this is like saying "in order to get a Bool
out of the set, you have to give an index to handle each of the two possible cases", rather like Either a b
's elimination rule (a -> c) -> (b -> c) -> Either a b -> c
. However, you get stuck when you try to come up with meaningful values for empty1
and full1
:
instance (El1 f, El1 g) => El1 (f :* g) where
data Set1 (f :* g) a = LSet1 (Set1 f a) | RSet1 (Set1 g a)
insert1 (l :* r) (LSet1 x) = LSet1 (insert1 l x)
insert1 (l :* r) (RSet1 y) = RSet1 (insert1 r y)
remove1 (l :* r) (LSet1 x) = LSet1 (remove1 l x)
remove1 (l :* r) (RSet1 y) = RSet1 (remove1 r y)
(LSet1 x) `contains1` (l :* r) = x `contains1` l
(RSet1 y) `contains1` (l :* r) = y `contains1` r
empty1 = _
full1 = _
You could try adding hacky Empty
and Full
constructors to Set1 (f :* g)
but then you'd struggle to implement insert1
and remove1
.
You could interpret a set of a product type as a pair of the sets of the two halves of the product. An item is in the set if both of its halves are in their respective sets. Like a sort of generalised intersection.
instance (El1 f, El1 g) => El1 (f :* g) where
data Set1 (f :* g) a = ProdSet1 (Set1 f a) (Set1 g a)
insert1 (l :* r) (ProdSet1 s t) = ProdSet1 (insert1 l s) (insert1 r t)
remove1 (l :* r) (ProdSet1 s t) = ProdSet1 (remove1 l s) (remove1 r t)
(ProdSet1 s t) `contains1` (l :* r) = (s `contains1` l) && (t `contains1` r)
empty1 = ProdSet1 empty1 empty1
full1 = ProdSet1 full1 full1
But this implementation doesn't work correctly. Observe:
ghci> let tip = Fix (L (K ()))
ghci> let fork l r = Fix (R (I l :* I r))
ghci> let s1 = insert (fork tip tip) empty
ghci> let s2 = remove (fork tip (fork tip tip)) s1
ghci> s2 `contains` (fork tip tip)
False
Removing fork tip (fork tip tip)
also removed fork tip tip
. tip
got removed from the left-hand half of the set, which meant any tree whose left-hand branch is tip
got removed with it. We removed more items than we intended. (However, if you don't need a remove
operation on your sets, this implementation works - though that's just another disappointing asymmetry.) You could implement contains
with ||
instead of &&
but then you'll end up inserting more items than you intended.
Finally, I also thought about treating a set of products as a set of sets.
data Set1 (f :* g) a = ProdSet1 (Set1 f (Set1 g a))
This doesn't work - try implementing any of El1
's methods and you'll get stuck right away.
So at the end of the day your intuition was right. Product types are the problem. There's no meaningful structural representation of sets for product types.
Of course, this is all academic really. Fixed points of polynomial functors have a well-defined notion of equality and ordering, so you can do what everyone else does and use Data.Set.Set
to represent sets, even for product types. It won't have such good asymptotics, though: equality and ordering tests on such values are O(n) (where n is the size of the value), making membership operations on such sets O(n log m) (where m is the size of the set) because the set itself is represented as a balanced tree. Contrast with our generic structural sets, where membership operations are O(n).