How do the requirements for the instances of the Applicative type class relate to their implementations for Functor

According to Haskell's library documentation, every instance of the Applicative class must satisfy the four rules:

identity: pure id <*> v = v
composition: pure (.) <*> u <*> v <*> w = u <*> (v <*> w)
homomorphism: pure f <*> pure x = pure (f x)
interchange: u <*> pure y = pure ($ y) <*> u

It then says that as a consequence of these rules, the underlying Functor instance will satisfy fmap f x = pure f <*> x. But since the method fmap does not even appear in the above equations, how exactly does this property follow from them?

Solution

Update: I've substantially expanded the answer. I hope it helps.

"Short" answer:

For any functor F, there is a "free theorem" (see below) for the type:

(a -> b) -> F a -> F b

This theorem states that for any (total) function, say foo, with this type, the following will be true for any functions f, f', g, and h, with appropriate matching types:

If f' . g = h . f, then foo f' . fmap g = fmap h . foo f.

Note that it is not at all obvious why this should be true.

Anyway, if you set f = id and g = id and use the functor law fmap id = id, this theorem simplifies to:

For all h, we have foo h = fmap h . foo id.

Now, if F is also an applicative, then the function:

foo :: (a -> b) -> F a -> F b
foo f x = pure f <*> x

has the right type, so it satisfies the theorem. Therefore, for all h, we have:

pure h <*> x
-- by definition of foo
= foo h x
-- by the specialized version of the theorem
= (fmap h . foo id) x
-- by definition of the operator (.)
= fmap h (foo id x)
-- by the definition of foo
= fmap h (pure id <*> x)
-- by the identity law for applicatives
= fmap h x

In other words, the identity law for applicatives implies the relation:

pure h <*> x = fmap h x

It is unfortunate that the documentation does not include some explanation or at least acknowledgement of this extremely non-obvious fact.

Longer answer:

Originally, the documentation listed the four laws (identity, composition, homomorphism, and interchange), plus two additional laws for *> and <* and then simply stated:

The Functor instance should satisfy
fmap f x = pure f <*> x

The wording above was replaced with the new text:

As a consequence of these laws, the Functor instance for f will satisfy
fmap f x = pure f <*> x

as part of commit 92b562403 in February 2011 in response to a suggestion made by Russell O'Connor on the libraries list.

Russell pointed out that this rule was actually implied by the other applicative laws. Originally, he offered the following proof (the link in the post is broken, but I found a copy on archive.org). He pointed out that the function:

possibleFmap :: Applicative f => (a -> b) -> f a -> f b
possibleFmap f x = pure f <*> x

satisfies the Functor laws for fmap:

pure id <*> x = x {- Identity Law -}

pure (f . g) <*> x
= {- infix to prefix -}
pure ((.) f g) <*> x
= {- 2 applications of homomorphism law -}
pure (.) <*> pure f <*> pure g <*> x
= {- composition law -}
pure f <*> (pure g <*> x)

and then reasoned that:

So, \f x -> pure f <*> x satisfies the laws of a functor. Since there is at most one functor instance per data type, (\f x -> pure f <*> x) = fmap.

A key part of this proof is that there is only one possible functor instance (i.e., only one way of defining fmap) per data type.

When asked about this, he gave the following proof of the uniqueness of fmap.

Suppose we have a functor f and another function
foo :: (a -> b) -> f a -> f b
Then as a consequence of the free theorem for foo, for any f :: a -> b and any g :: b -> c.
foo (g . f) = fmap g . foo f
In particular, if foo id = id, then
foo g = foo (g . id) = fmap g . foo id = fmap g . id = fmap g

Obviously, this depends critically on the "consequence of the free theorem for foo". Later, Russell realized that the free theorem could be used directly, together with the identity law for applicatives, to prove the needed law. That's what I've summarized in my "short answer" above.

Free Theorems...

So what about this "free theorem" business?

The concept of free theorems comes from a paper by Wadler, "Theorems for Free". Here's a Stack Overflow question that links to the paper and some other resources. Understanding the theory "for real" is hard, but you can think about it intuitively. Let's pick a specific functor, like Maybe. Suppose we had a function with the following type;

foo :: (a -> b) -> Maybe a -> Maybe b
foo f x = ...

Note that, no matter how complex and convoluted the implementation of foo is, that same implementation needs to work for all types a and b. It doesn't know anything about a, so it can't do anything with values of type a, other than apply the function f, and that just gives it a b value. It doesn't know anything about b either, so it can't do anything with a b value, except maybe return Just someBvalue. Critically, this means that the structure of the computation performed by foo -- what it does with the input value x, whether and when it decides to apply f, etc. -- is entirely determined by whether x is Nothing or Just .... Think about this for a bit -- foo can inspect x to see if it's Nothing or Just someA. But, if it's Just someA, it can't learn anything about the value someA: it can't use it as-is because it doesn't understand the type a, and it can't do anything with f someA, because it doesn't understand the type b. So, if x is Just someA, the function foo can only act on its Just-ness, not on the underlying value someA.

This has a striking consequence. If we were to use a function g to change the input values out from under foo f x by writing:

foo f' (fmap g x)

because fmap g doesn't change x's Nothing-ness or Just-ness, this change as no effect on the structure of foo's computation. It behaves the same way, processing the Nothing or Just ... value in the same way, applying f' in exactly the same circumstances and at exactly the same time that it previously applied f, etc.

This means that, as long as we've arranged things so that f' acting on the g-transformed value gives the same answer as an h-transformed version of f acting on the original value -- in other words if we have:

f' . g = h . f

then we can trick foo into processing our g-transformed input in exactly the same way it would have processed the original input, as long as we account for the input change after foo has finished running by applying h to the output:

foo f' (fmap g x) = fmap h (foo f x)

I don't know whether or not that's convincing, but that's how we get the free theorem:

If f' . g = h . f then foo f' . fmap g = fmap h . foo f.

It basically says that because we can transform the input in a way that foo won't notice (because of its polymorphic type), the answer is the same whether we transform the input and run foo, or run foo first and transform its output instead.