Search code examples
dictionaryhaskellsetmultimapmultiset

Deriving associative containers from map


There is Map type (associative array) in Data.Map.Lazy module in containers package:

data Map k a = ...

I could derive Set from this type:

import qualified Data.Map.Lazy as Map
newtype Set a = Set (Map.Map a ())

But the Data.Set module in the same package doesn't. Is that because of performance issues?

Furthermore, I could derive C++'s std::multiset and std::multimap equivalent by similar method:

type Occur = Int
newtype MultiSet a = MultiSet (Map.Map a Occur)
newtype MultiMap k a = MultiMap (Map.Map k (MultiSet a))

For the containers package doesn't provide such types, I'm actually using my own implementations for these types.

The advantages were that it was easy to implement utilities for these types, such as (purely functional) equivalents of C++'s size, insert, and erase. For examples:

instance Foldable MultiSet where
    foldMap f (MultiSet xs) = Map.foldMapWithKey (\x o -> stimes o (f x)) xs
    toList = toAscList
    null (MultiSet xs) = null xs
    length = size

size :: MultiSet a -> Occur
size (MultiSet xs) = sum xs

insert :: Ord a => a -> MultiSet a -> MultiSet a
insert x = insertMany x 1

insertMany :: Ord a => a -> Occur -> MultiSet a -> MultiSet a
insertMany x n (MultiSet xs) = MultiSet (Map.alter (\maybeo -> case maybeo of
    Nothing -> Just n
    Just m -> maybeClampNeg (m + n)
    ) x xs)

clampNeg :: Occur -> Occur
clampNeg n = if n < 0 then 0 else n

maybeClampNeg :: Occur -> Maybe Occur
maybeClampNeg n = case clampNeg n of
    0 -> Nothing
    n' -> Just n'

delete :: Ord a => a -> MultiSet a -> MultiSet a
delete x = deleteMany x 1

deleteMany :: Ord a => a -> Occur -> MultiSet a -> MultiSet a
deleteMany x n (MultiSet xs) = MultiSet (Map.update (maybeClampNeg . subtract n) x xs)

Will there be performance issues in these implementations?


Solution

  • The Set type is not implemented as Map k () because in that implementation every entry is also carrying a value of (), which is pure overhead.

    Your multiset implementation is basically the same as the one in Hackage.

    This Multimap, on the other hand, is implemented as a map of lists. That doesn't imply that there is anything wrong with your proposal; in fact it might be better for some use-cases.

    However this does illustrate a wider design issue; the composibility of data structures like this in Haskell means that in practice its easier to create such structures ad-hoc than to try to create a library containing every possible combination of nested containers.