Why do I have to call `sum` twice to sum a list of `Maybe Integer`s?

I wrote a solution for a very simple practice problem:

Calculate the number of grains of wheat on a chessboard given that the number on each square doubles. Write code that shows how many grains are on a given square, and the total number of grains on the chessboard.

The function must return a Maybe Integer and return Nothing if the input is less than 1 or greater than 64.

square :: Integer -> Maybe Integer
square n = if (n < 1 || n > 64) then Nothing else Just (2^(pred n))
total :: Integer
total = sum (fmap sum (map square [1..64]))

I tried to apply fmap sum to some test output of map square (list of Maybe Integer) in GHCI and was surprised to discover it returns the list of integers (sans Just) rather than their sum. So in the solution above I apply sum a second time to actually get the sum.

I would like to understand conceptually why this is the case: in other words why is sum behaving like a converter for Maybe Ints to Ints in this context instead of adding things?

I've solved some similar exercises relying on helper functions to avoid complications of doing computation on Maybe values, and perhaps in this case I should just calculate the value of total without utilizing square, i.e.:

total = sum [2^n | n <- [0..63]]

However since I'd already written a useful function my first instinct was to reuse it, which led to some unforeseen behavior.

Solution

One thing worth internalising; when you map a function over a list¹, the result is always going to be a list with the same number of elements. The function will be applied to each element of the list individually; it cannot combine them into a single summary value. That's just not what fmap does.

So with that principle in mind, we know that fmap sum squares (where squares = map square [1..64]) cannot possibly result in the sum of squares. It's going to be [ sum (square 1), sum (square 2), ... , sum (square 64) ]. We will then need to apply sum again to that whole list, if we want to actually add them up.

That just leaves an explanation for what sum (square 1) etc is actually doing (i.e. what sum does when applied to Maybe values). The proper type for sum is sum :: (Foldable t, Num a) => t a -> a. Foldable is basically the type-class of structures that you can scan for 0 or more elements in order (essentially: those you can convert to a list). All sum does is add up the elements that are there, however many there are (and use 0 as a "starting value" in case there are no elements). Maybe has a Foldable instance; it always has 0 or 1 elements, and sum can add those up just as well as it can add up lists that happen to have only 0 or 1 elements.

Of course the effect of summing zero-or-one numbers is just that the result is 0 if there were zero inputs and equal the input number if there was one. + never actually gets called for this "sum", which makes it feel a little pointless. But sum didn't know that; it works for any Foldable, regardless of how many elements they contain. And Maybe didn't know that it would end up being used to "sum without actually adding"; it just implemented Foldable so that any other function that wants to scan a variable number of elements out of a structure can scan Maybes.

If you think it's silly, just don't use sum for that job; use fromMaybe 0 instead.

¹ You can generalise this to other functors beyond lists; when you fmap a function over a data structure, the result will have exactly the same structure as the input. Only the "leaf nodes" will be different.