Search code examples
haskelltypesbagcommutativity

Truly unordered foldable bag


I want a Bag container which hides its 'real' order from its clients.

It also must be fully polymorphic, that is shouldn't require any constraints over its element type.

I found at least three implementations of bags: Bag module from ghc package, Data.Bag from bag and Math.Combinatorics.Multiset from multiset-comb.

However, they all have toList and fold* operations which expose the internal order of elements which may depend on implementation details or the order of bag construction.

toList is impossible, at least with type Bag a -> [a]. However, folding does not always expose the order.

For example, fold (+) 0 does not expose.

The question is, how should I design the folding interface? Is there a necessary and sufficient condition for safety of the a -> a -> a folding function? As fmap does not expose the order, is there a loss of genericity from folding with a -> b -> b?

I'm thinking of commutative monoids - they seem sufficient, but I'm not sure if associativity and identity element are necessary.


Solution

  • An identity is probably necessary if your bags can be empty - you have to return something in that case, and if you want your fold to be a homomorphism (so that combining the results of folding some bags is the same as folding a bag made by combining the bags, which is a pretty natural property to expect), it must be an identity element.

    Associativity, likewise, is a good idea. Suppose I have a type and operation like so:

    data T a = Zero | One a | Two (T a) (T a)
      deriving (Eq, Ord, Show)
    
    (+-+) :: Ord a => T a -> T a -> T a
    Zero +-+ x = x
    x +-+ Zero = x
    a +-+ b = Two (min a b) (max a b)
    

    Clearly (+-+) is commutative and has an identity, but is non-associative. Suppose I then implement a bag as a list:

    newtype Bag a = Bag [a]
    
    -- pre-condition: f is commutative and z is an identity for it
    foldBag :: (a -> a -> a) -> a -> Bag a -> a
    foldBag f z (Bag xs) = foldr f z xs
    
    foldNonAssoc :: (Ord a) => Bag (T a) -> T a
    foldNonAssoc = foldBag (+-+) Zero
    

    Even if I demand the stated precondition, I can still use my foldNonAssoc to distinguish between Bag [One 1,One 2,One 3], which will fold to Two (One 1) (Two (One 2) (One 3)) and Bag [One 3,One 2,One 1], which will fold to Two (One 3) (Two (One 1) (One 2)) (notice that not all of the structure is preserved, but on a long list I'll get the entire list order back except for the ordering of the last two elements).

    A priori, if you combine all your items with an operation, you'll have a tree of applications, something like a +-+ (b +-+ (c +-+ d)). Commutativity will let you do some rearrangement, but no matter what you do, c will always be combined with d. So if you want that to be the same as (a +-+ c) +-+ (b +-+ d), you really need associativity, too.