Search code examples
haskellmonoids

How to make my probability density type an instance of Monoid?


I have a type to describe post-calibration radiocarbon date probability distributions. The details and background don't matter for the question: It boils down to one probability value in _calPDFDens for each year in _calPDFCals:

data CalPDF = CalPDF {
    -- | Sample identifier, e.g. a lab number
      _calPDFid :: String
    -- | Years calBCAD
    , _calPDFCals :: VU.Vector YearBCAD
    -- | Probability densities for each year in '_calPDFCals'
    , _calPDFDens :: VU.Vector Float
    }

(VU is Data.Vector.Unboxed)

Now: It is common practice to sum multiple such distributions to derive a sum probability distribution. That means a full outer join on the years in _calPDFCals and then summing the respective values in _calPDFDens. I implemented this as follows:

sumPDFs :: CalPDF -> CalPDF -> CalPDF
sumPDFs = combinePDFs (+)

combinePDFs :: (Float -> Float -> Float) -> CalPDF -> CalPDF -> CalPDF
combinePDFs f (CalPDF name1 cals1 dens1) (CalPDF name2 cals2 dens2) = 
    let startRange = minimum [VU.head cals1, VU.head cals2]
        stopRange = maximum [VU.last cals1, VU.last cals2]
        emptyBackdrop = zip [startRange..stopRange] (repeat (0.0 :: Float))
        pdf1 = VU.toList $ VU.zip cals1 dens1
        pdf2 = VU.toList $ VU.zip cals2 dens2
        pdfCombined = fullOuter f pdf2 (fullOuter f pdf1 emptyBackdrop)
        pdfNew = CalPDF (name1 ++ "+" ++ name2) (VU.fromList $ map fst pdfCombined) (VU.fromList $ map snd pdfCombined)
    in normalizeCalPDF pdfNew
    where
        -- https://stackoverflow.com/questions/24424403/join-or-merge-function-in-haskell
        fullOuter :: (Float -> Float -> Float) -> [(YearBCAD, Float)] -> [(YearBCAD, Float)] -> [(YearBCAD, Float)]
        fullOuter _ xs [] = xs
        fullOuter _ [] ys = ys
        fullOuter f xss@(x:xs) yss@(y:ys)
            | fst x == fst y = (fst x, f (snd x) (snd y)) : fullOuter f xs ys
            | fst x < fst y  = x                          : fullOuter f xs yss
            | otherwise      = y                          : fullOuter f xss ys

I was wondering if I could rewrite this code, so that CalPDF becomes an instance of Monoid and sumPDFs becomes <>.

The issue I can not overcome and which lead me to post is question, is how mempty should look like. I already have something like this in combinePDFs: emptyBackdrop. This is required in my implementation, to fill or complete years in between both input PDFs, if they do not overlap.

emptyBackdrop fulfills some of the requirements for mempty, but it depends on the input PDFs. Theoretically, the true mempty would be a CalPDF, which starts at the beginning of time, ends at the end of time and attributes each of these infinite years a probability of zero. But this can not be implemented with unboxed vectors.

Is there an elegant way to make CalPDF and instance of Monoid? Would it be useful already to make it an instance of Semigroup with what I have already?


Edit: As suggested by @leftaroundabout here is a reproducible, minimal implementation of the setup described above.

main :: IO ()
main = do
    let myPDF1 = [(1,1), (2,1), (3,1)]
        myPDF2 = [(2,1), (3,1), (4,1)]
    putStrLn $ show $ sumPDFs myPDF1 myPDF2

type CalPDF = [(Int, Float)]

sumPDFs :: CalPDF -> CalPDF -> CalPDF
sumPDFs pdf1 pdf2 = 
    let startRange = minimum [fst $ head pdf1, fst $ head pdf2]
        stopRange = maximum [fst $ last pdf1, fst $ last pdf2]
        emptyBackdrop = zip [startRange..stopRange] (repeat (0.0 :: Float))
        pdfCombined = fullOuter pdf2 (fullOuter pdf1 emptyBackdrop)
    in pdfCombined
    where
        fullOuter :: [(Int, Float)] -> [(Int, Float)] -> [(Int, Float)]
        fullOuter xs [] = xs
        fullOuter [] ys = ys
        fullOuter xss@(x@(year1,dens1):xs) yss@(y@(year2,dens2):ys)
            | year1 == year2 = (year1, dens1 + dens2) : fullOuter xs ys
            | year1 < year2  = x                      : fullOuter xs yss
            | otherwise      = y                      : fullOuter xss ys

Solution

  • Consider reworking your type a bit.

    import Data.Map (Map)
    import qualified Data.Map as M
    
    data CalPDF = CalPDF
        { _calPDFid :: [String]
        , _calPDFdens :: Map YearBCAD Float
        }
    

    The instances can now be quite short indeed:

    instance Semigroup CalPDF where
        CalPDF id dens <> CalPDF id' dens' = CalPDF
            (id <> id')
            (M.unionWith (+) dens dens')
    
    instance Monoid CalPDF where
        mempty = CalPDF mempty mempty
    

    I used [String] in place of a single +-delimited String for two reasons: it allows you to use + in your names without ambiguity, and it makes <> a bit simpler as you don't need to avoid adding a + when one or the other argument is the empty String to remain law-abiding. Pretty-printers can still show this with +s using e.g. intercalate "+".

    You could use HashMap or IntMap in place of Map in essentially the same way if one of those fits your needs better.