Search code examples
haskelldistributionfrequency

Dist of a Frequency in Haskell using a call to foldmap


I am very new to Haskell. What could I replace the "undefined" in the definition below so that the "frequencies" calculates the frequency distribution of items in the input list. For example, the expression "frequencies [True, False, True]" should produce a distribution where True has a frequency of 2 and False has a frequency of 1.

I can add new top-level function definitions, but not modify any other part of the definition of "frequencies". In particular, I can't add any additional named arguments or remove the call to "foldMap" in the definition of "frequencies".

type Dist a = a -> Sum

frequencies :: Eq a => [a] -> Dist a
 frequencies = foldmap = undefined

Solution

  • You can implement this by following the types using GHC typed holes, entering an underscore _ to ask the compiler what type of an expression should be.

    import Data.Semigroup (Sum(..))
    
    -- Using the standard 'Sum' for illustration.
    type Dist a = a -> Sum Int
    
    frequencies :: (Eq a) => [a] -> Dist a
    frequencies = foldMap _
    

    Here, the compiler reports that _ :: a -> Dist a. So we’re given each a value in the input, and must produce a corresponding Dist a, which are then combined into a final result with foldMap. This relies on the fact that there are Semigroup and Monoid instances for functions, which just combine their results:

    instance (Semigroup m) => Semigroup (a -> m) where
      f <> g = \ x -> f x <> g x
    
    instance (Monoid m) => Monoid (a -> m) where
      mempty = \ _x -> mempty
    

    Dist a is a function that accepts an a and returns a Sum, which may be combined using its Semigroup/Monoid instances to add the results. So we want to introduce a lambda:

    frequencies :: (Eq a) => [a] -> Dist a
    frequencies = foldMap (\ x -> _)
    

    The hole now has type Dist a, which is a function, so we introduce another parameter:

    frequencies :: (Eq a) => [a] -> Dist a
    frequencies = foldMap (\ x -> \ y -> _)
    

    Now we must produce a Sum Int value. All we have are two a values, and a constraint that a can be compared with Eq. So let’s introduce an if to test their equality:

    frequencies :: (Eq a) => [a] -> Dist a
    frequencies = foldMap (\ x -> \ y -> if x == y then _ else _)
    

    What Sum values should we use for the holes? Presumably we want to add 1 to the total if the values are equal, and nothing if they differ. So we can use Sum 1 for the true case, and mempty (or Sum 0) for the false case. We can also collapse the lambda using the normal syntactic sugar for multiple parameters.

    frequencies :: (Eq a) => [a] -> Dist a
    frequencies = foldMap (\ x y -> if x == y then Sum 1 else mempty)
    

    Now we can call this function on a list, and it will produce a function that gives us the frequency for a particular element.

    > f = frequencies "AAB"
    > :t f
    f :: Dist Char
    > f 'A'
    Sum {getSum = 2}
    > f 'B'
    Sum {getSum = 1}
    

    From this, we can build more interesting things, like a histogram:

    import Data.List (nub)
    
    histogram :: (Ord a) => [a] -> [(a, Int)]
    histogram xs = let
      keys = nub (sort xs)
      frequency = frequencies xs
      in zip keys (map (getSum . frequency) keys)
    
    > histogram "AAB"
    [('A',2),('B',1)]