Search code examples
haskellfold

Folding over a list and counting all occurrences of arbitrarily many unique and unknown entities


I am using the Control.Foldl library to traverse an arbitrarily long list and counting all occurrences of arbitrarily many unique entities. Ie, the list may be of form

[Just "a", Just "b", Just "aab", Nothing, Just "aab"]

and I my result should be of form

[(Just "a",1),(Just "b",1) (Just "aab", 2), (Nothing, 1)]

Now the issue is I do not have the name of these entities a priori, and I would like to dynamically update the results as I fold.

My problem is that I do not know how to describe this computation in terms of the Fold a b data type from foldl. Specifically, at each step of the fold I need to traverse the result list and ask if I have seen the current item, but I see no way of describing this using foldl.


Solution

  • In addition to the other answers, I'd like to bring your attention to the concept of monoids. It's an abstraction for combining a sequence of elements (including 0-length) using an associative operation.

    In this case, the monoid will be a map of elements to numbers (their count), with the empty element being an empty map and the combining operation merges the two maps, summing values for keys present in both.

    {-# Language DerivingVia        #-}
    {-# Language DerivingStrategies #-}
    
    import Data.Foldable
    import qualified Data.Map as M
    import Data.Monoid
    -- https://hackage.haskell.org/package/monoidal-containers
    import Data.Map.Monoidal
    
    newtype CountMap k = CountMap { getCountMap :: M.Map k Int }
      -- (<>)   = M.unionWith (+)
      -- mempty = M.empty
      deriving (Semigroup, Monoid)
      via MonoidalMap k (Sum Int)
    
      deriving
      stock (Eq, Ord, Show, Read)
    
    singleton :: k -> CountMap k
    singleton x = CountMap $ M.singleton x 1
    
    unique :: (Foldable f, Ord k) => f k -> [(k, Int)]
    unique = M.toList . getCountMap . foldMap singleton
    

    While solutions described using monoids aren't not necessarily the shortest ones, they often express the main idea more clearly and on a higher level than than folds.

    Also for structures other than lists, for example trees, combining elements using monoids is more natural (and in some cases more efficient): Each leaf is converted to its corresponding value in the monoid and then the values are combined bottom up.

    See also Monoids and Finger Trees.