Search code examples
haskellabstract-data-type

Is there any type-algebra function mapping an ADT to the set of elements of that ADT?


For some simple ADTs, we can obtain the ADT of sets of that ADT as data Set = Full Set (*repeated N times) | Empty Set(*repeated N times), where N is the number of non terminal constructors of that ADT. For a more solid example, take Nat:

data Nat = Succ Nat | Zero

We can obtain a type for sets of nats as follows:

data NatSet = Full NatSet | Empty NatSet

So, for example,

empty :: NatSet
empty = Empty empty

insert :: Nat -> NatSet -> NatSet
insert Zero (Empty natverse)       = Full natverse
insert Zero (Full natverse)        = Full natverse
insert (Succ nat) (Empty natverse) = Empty (insert nat natverse)
insert (Succ nat) (Full natverse)  = Full (insert nat natverse)

member :: Nat -> NatSet -> Bool 
member Zero (Full natverse)        = True
member Zero (Empty natverse)       = False
member (Succ nat) (Empty natverse) = member nat natverse
member (Succ nat) (Full natverse)  = member nat natverse

main = do
    let set = foldr insert empty [Zero, Succ (Succ Zero), Succ (Succ (Succ (Succ (Succ Zero))))]
    print $ member Zero set
    print $ member (Succ Zero) set
    print $ member (Succ (Succ Zero)) set
    print $ member (Succ (Succ (Succ Zero))) set
    print $ member (Succ (Succ (Succ (Succ Zero)))) set
    print $ member (Succ (Succ (Succ (Succ (Succ Zero))))) set

For other types, it is equally simple:

data Bin = A Bin | B Bin | C
data BinSet = Empty BinSet BinSet | Full BinSet BinSet

But what about types that branch? It isn't obvious to me:

data Tree = Branch Tree Tree | Tip
data TreeSet = ???

Is there any type-algebraic argument that maps ADTs to the ADTs of sets of that type?


Solution

  • Let's take another look at your set type.

    data NatSet = Full NatSet | Empty NatSet
    

    There's always another NatSet inside this one; the two branches of the sum type indicate whether the current Nat is present in the set. This is a structural representation of sets. As you observed, the structure of the set depends on the structure of the values.

    It's equivalent to a stream of Booleans: we test for membership by indexing into the stream.

    data NatSet = NatSet Bool NatSet
    
    empty = NatSet False empty
    
    insert Z (NatSet _ bs) = NatSet True bs
    insert (S n) (NatSet b bs) = NatSet b (insert n bs)
    
    (NatSet b _) `contains` Z = b
    (NatSet _ bs) `contains` (S n) = bs `contains` n
    

    Based on the insight that set membership is like indexing into a collection of Booleans, let's pursue a generic implementation of sets of values of types formed as the fixed point of a polynomial functor (what's the problem?).


    First things first. As you observed, the set's structure depends on the type of the things inside it. Here's the class of things which can be elements of a set,

    class El a where
        data Set a
        empty :: Set a
        full :: Set a
    
        insert :: a -> Set a -> Set a
        remove :: a -> Set a -> Set a
    
        contains :: Set a -> a -> Bool
    

    For a first example, I'll modify NatSet from above to suit this format.

    instance El Nat where
        data Set Nat = NatSet Bool (Set Nat)
    
        empty = NatSet False empty
        full = NatSet True empty
    
        insert Z (NatSet _ bs) = NatSet True bs
        insert (S n) (NatSet b bs) = NatSet b (insert n bs)
    
        remove Z (NatSet _ bs) = NatSet False bs
        remove (S n) (NatSet b bs) = NatSet b (remove n bs)
    
        (NatSet b _) `contains` Z = b
        (NatSet _ bs) `contains` (S n) = bs `contains` n
    

    Another simple El instance which we'll need later is (). A set of ()s is either full or empty, with nothing in between.

    instance El () where
        newtype Set () = UnitSet Bool
        empty = UnitSet False
        full = UnitSet True
        insert () _ = UnitSet True
        remove () _ = UnitSet False
        (UnitSet b) `contains` () = b
    

    The fixed point of a functor f,

    newtype Fix f = Fix (f (Fix f))
    

    in an instance of El whenever f is an instance of the following El1 class.

    class El1 f where
        data Set1 f a
        empty1 :: El a => Set1 f (Set a)
        full1 :: El a => Set1 f (Set a)
        insert1 :: El a => f a -> Set1 f (Set a) -> Set1 f (Set a)
        remove1 :: El a => f a -> Set1 f (Set a) -> Set1 f (Set a)
        contains1 :: El a => Set1 f (Set a) -> f a -> Bool
    
    instance El1 f => El (Fix f) where
        newtype Set (Fix f) = FixSet (Set1 f (Set (Fix f)))
        empty = FixSet empty1
        full = FixSet full1
        insert (Fix x) (FixSet s) = FixSet (insert1 x s)
        remove (Fix x) (FixSet s) = FixSet (remove1 x s)
        (FixSet s) `contains` (Fix x) = s `contains1` x
    

    As usual we'll be composing bits of functors into bigger functors, before taking the resulting functor's fixed point to produce a concrete type.

    newtype I a = I a
    newtype K c a = K c
    data (f :* g) a = f a :* g a
    data (f :+ g) a = L (f a) | R (g a)
    type N = K ()
    

    For example,

    type Nat = Fix (N :+ N)
    type Tree = Fix (N :+ (I :* I))
    

    Let's make sets of these things.

    I (for identity) is the location in the polynomial where Fix will plug in the recursive substructure. We can just delegate its implementation of El_ to Fix's.

    instance El1 I where
        newtype Set1 I a = ISet1 a
        empty1 = ISet1 empty
        full1 = ISet1 full
        insert1 (I x) (ISet1 s) = ISet1 (insert x s)
        remove1 (I x) (ISet1 s) = ISet1 (remove x s)
        (ISet1 s) `contains1` (I x) = s `contains` x
    

    The constant functor K c features no recursive substructure, but it does feature a value of type c. If c can be put in a set, K c can be too.

    instance El c => El1 (K c) where
        newtype Set1 (K c) a = KSet1 (Set c)
        empty1 = KSet1 empty
        full1 = KSet_ full
        insert1 (K x) (KSet1 s) = KSet1 (insert x s)
        remove1 (K x) (KSet1 s) = KSet1 (remove x s)
        (KSet1 s) `contains1` (K x) = s `contains` x
    

    Note that this definition makes Set1 N isomorphic to Bool.

    For sums, let's use our intuition that testing for membership is like indexing. When you index into a tuple, you choose between the left-hand and right-hand members of the tuple.

    instance (El1 f, El1 g) => El1 (f :+ g) where
        data Set1 (f :+ g) a = SumSet1 (Set1 f a) (Set1 g a)
        empty1 = SumSet1 empty1 empty1
        full1 = SumSet1 full1 full1
        insert1 (L x) (SumSet1 l r) = SumSet1 (insert1 x l) r
        insert1 (R y) (SumSet1 l r) = SumSet1 l (insert1 y r)
        remove1 (L x) (SumSet1 l r) = SumSet1 (remove1 x l) r
        remove1 (R y) (SumSet1 l r) = SumSet1 l (remove1 y r)
        (SumSet1 l r) `contains1` (L x) = l `contains1` x
        (SumSet1 l r) `contains1` (R y) = r `contains1` y
    

    This is now enough to do sets of natural numbers. With appropriate instances of Show, you can do this:

    ghci> let z = Fix (L (K ()))
    ghci> let s n = Fix (R (I n))
    ghci> insert (s z) empty `contains` z
    False
    ghci> insert (s z) empty `contains` (s z)
    True
    ghci> empty :: Set (Fix (N :+ I))
    FixSet (SumSet1 (KSet1 (UnitSet False)) (ISet1 (FixSet (  -- to infinity and beyond
    

    I still haven't answered your original question, which was how should this work for product types? I can think of a few strategies, but none of them actually work.

    We could make Set1 (f :* g) a a sum type. This has a pleasing symmetry: sets of sums are products, and sets of products are sums. In the context of the indexing idea, this is like saying "in order to get a Bool out of the set, you have to give an index to handle each of the two possible cases", rather like Either a b's elimination rule (a -> c) -> (b -> c) -> Either a b -> c. However, you get stuck when you try to come up with meaningful values for empty1 and full1:

    instance (El1 f, El1 g) => El1 (f :* g) where
        data Set1 (f :* g) a = LSet1 (Set1 f a) | RSet1 (Set1 g a)
        insert1 (l :* r) (LSet1 x) = LSet1 (insert1 l x)
        insert1 (l :* r) (RSet1 y) = RSet1 (insert1 r y)
        remove1 (l :* r) (LSet1 x) = LSet1 (remove1 l x)
        remove1 (l :* r) (RSet1 y) = RSet1 (remove1 r y)
        (LSet1 x) `contains1` (l :* r) = x `contains1` l
        (RSet1 y) `contains1` (l :* r) = y `contains1` r
        empty1 = _
        full1 = _
    

    You could try adding hacky Empty and Full constructors to Set1 (f :* g) but then you'd struggle to implement insert1 and remove1.

    You could interpret a set of a product type as a pair of the sets of the two halves of the product. An item is in the set if both of its halves are in their respective sets. Like a sort of generalised intersection.

    instance (El1 f, El1 g) => El1 (f :* g) where
        data Set1 (f :* g) a = ProdSet1 (Set1 f a) (Set1 g a)
        insert1 (l :* r) (ProdSet1 s t) = ProdSet1 (insert1 l s) (insert1 r t)
        remove1 (l :* r) (ProdSet1 s t) = ProdSet1 (remove1 l s) (remove1 r t)
        (ProdSet1 s t) `contains1` (l :* r) = (s `contains1` l) && (t `contains1` r)
        empty1 = ProdSet1 empty1 empty1
        full1 = ProdSet1 full1 full1
    

    But this implementation doesn't work correctly. Observe:

    ghci> let tip = Fix (L (K ()))
    ghci> let fork l r = Fix (R (I l :* I r))
    ghci> let s1 = insert (fork tip tip) empty
    ghci> let s2 = remove (fork tip (fork tip tip)) s1
    ghci> s2 `contains` (fork tip tip)
    False
    

    Removing fork tip (fork tip tip) also removed fork tip tip. tip got removed from the left-hand half of the set, which meant any tree whose left-hand branch is tip got removed with it. We removed more items than we intended. (However, if you don't need a remove operation on your sets, this implementation works - though that's just another disappointing asymmetry.) You could implement contains with || instead of && but then you'll end up inserting more items than you intended.

    Finally, I also thought about treating a set of products as a set of sets.

    data Set1 (f :* g) a = ProdSet1 (Set1 f (Set1 g a))
    

    This doesn't work - try implementing any of El1's methods and you'll get stuck right away.

    So at the end of the day your intuition was right. Product types are the problem. There's no meaningful structural representation of sets for product types.

    Of course, this is all academic really. Fixed points of polynomial functors have a well-defined notion of equality and ordering, so you can do what everyone else does and use Data.Set.Set to represent sets, even for product types. It won't have such good asymptotics, though: equality and ordering tests on such values are O(n) (where n is the size of the value), making membership operations on such sets O(n log m) (where m is the size of the set) because the set itself is represented as a balanced tree. Contrast with our generic structural sets, where membership operations are O(n).