What is the mathematical theory or theorem underlying join of monad?

For example:

Maybe (Maybe Bool) -> Maybe Bool    
    
Just (Just True) -----> Just True     
Just (Just False) ----> Just False    
Just (Nothing) -------> Nothing    
Nothing --------------> ?

It would map Nothing to Nothing. What is the mathematical theory or theorem underlying it? If related to category theory, what part is it related to?

Is there a mathematical theory related to the Behavior of join of State, Writer, Reader, etc?

Is there any guarantee that m(m a) -> m a is safe?

edited:

Since result type a of m a -> a forgets the structure. so instead in order not to forget, result type m a of m (m a) -> m a is used for the compound effect of outer - m (...) with effect of inner - m.

Strictly speaking, two information(effect) was compounded into one only before the structure disappeared. The structure no longer exists.

I thought it was important to guarantee that there was no problem in doing so. Is it up to the programmer without any special rules or theory?

The compound doesn't look natural to me, it looks artificial.
Sorry for the vague question, thanks for all the comments.

Solution

The Mathematical definition of a monad, with translation to Haskell:

A couple of incidental preliminary notes

I'm going to assume you're familiar with Functors in Haskell. If not I'm tempted to direct you to my explanation on this other question here. I'm not going to explain category theory to you except by translating it into Haskell as best I can.

Identity functor vs Identity Functor instance.

Note: Firstly let me point out that the identity functor in mathematics does nothing, whereas the Identity functor in Haskell adds a newtype wrapper. Whenever we use the mathematical identity functor on a type a we should just get a back, so I won't be using the Identity functor instance.

Natural transformations

Secondly, note that a natural transformation between two functors (either possibly the identity), in Haskell is a polymorphic function e between two types made by (possibly mathematical identity) Functor instances, for example [a] -> Maybe a or (Int,a) -> Either String a such that e . fmap f == fmap f . e.

So safeLast : [a] -> Maybe a is a natural transformation, because safeLast (map f xs) == fmap f (safeLast xs), and even

rejectSomeSmallNumbers :: (Int,a) -> Either String a
rejectSomeSmallNumbers (i,a) = case i of
   0 -> Left "Way too small!"
   1 -> Left "Too small!"
   2 -> Left "Two is small."
   3 -> Left "Three is small, too."
   _ -> Right a

is a natural transformation because rejectSomeSmallNumbers . fmap f == fmap f . rejectSomeSmallNumbers :: (Int,a) -> Either String b.

A natural transformation can use as much information as it likes about the two functors it connects (eg (,) Int and Either String) but it can't use any information about the type a any more than the functors can. It shouldn't be possible to write a polymorphic function between two valid functor types that's not a natural transformation. See this answer for more information.

What is a monad according to Maths and Haskell?

Let H be a category (let Hask be the kind of all haskell types together with function types, functions, etc etc).

A monad on H is (a monad in Hask is)

an endofunctor M : H -> H
- a type constructor m: * -> * which has a Functor instance with fmap :: (a -> b) -> (m a -> m b)
and a natural transformation eta : 1_H -> M
- a polymorphic function from a -> m a, called pure defined in the Applicative instance
a natural transformation mu : M^2 -> M
- a polymorphic function from m (m a) -> m a, called join that is defined in Control.Monad

such that the following two rules hold:

mu . M mu == mu . mu M as natural transformations M^3 -> M
- join . fmap join == join . join :: m (m (m a)) -> m a
mu . M eta == mu . eta M == 1_H as natural transformations M -> M
- join . fmap pure == join . pure == id :: m a -> m a

What do these two rules mean?

Just to give you a handle on what these two conditions are saying, here they are when we're using lists:

(join . fmap join) [xss, yss, zss] 
== join [join xss, join yss, join zss] 
== join (join [xss, yss, zss])

and

join (fmap pure xs)
 == join [[x] | x <- xs]
 == xs
 == id xs
 == join [xs]
 == join (pure xs)

(Fun fact, join isn't part of the monad definition. I have a perhaps unreliable memory that it used to be, but in Control.Monad it's defined as join x = x >>= id and as commented there, it could be defined as join bss = do { bs <- bss ; bs })

What does this mean for the `Maybe` monad in your example?

Well firstly, because join is polymorphic (mu is a natural transformation), it can't use any information about the type a in Maybe a, so we couldn't for example make it so that join (Just (Just False)) = Just True or join (Just Nothing) = Just False because we can only use values that are already in the Maybe a we're given:

join :: Maybe (Maybe a) -> Maybe a
join Nothing = Nothing          -- can't provide Just a because we have no a
join (Just Nothing) = Nothing   -- same reason
join (Just (Just a)) =                 
    -- two choices: we could do the obviously correct `Just a` or collapse everything with `Nothing`.

pure :: a -> Maybe a
pure a =                               
        -- two choices: we could do the obviously correct `Just a` or collapse everything with `Nothing`.

What stops us doing the crazy Nothing thing?

Let's look at the two rules, specialising to Maybe, and to the Just branches, because all the Nothings are inevitably Nothing because of polymorphism.

(join . fmap join) (Just maybemaybe) 
== join (Just (join maybemaybe)) 
== join (join (Just maybemaybe)) -- required by he rule

That one works if we put Just a in the definition, or if we put Nothing, too.

In the second rule:

join (fmap pure (Just a))
 == join (Just (pure a))
 == join (pure (Just a))
 == id (Just a)           -- by the rule
 == Just a

Well that forces pure to be Just, and at the same time forces join (Just (Just a)) to give us Just a.

Reader

Let's ditch the newtype wrapping to make the laws easier to talk about.

type Reader input a = input -> a

We'd need

join :: Reader input (Reader input a) -> Reader input a
join (make_an_a_maker :: (input -> (input -> a)) :: input -> a
join make_an_a_maker input = (make_an_a_maker input) input

There isn't anything else we can do without using undefined or similar.

So what stops you making crazy `join` functions?

Most of the time, the fact that you're making a polymorphic function, some of the time because you want to do the obviously correct thing and it works, and the rest of the time because you chose to follow the rules.

Not-relevant nerd note:

I prefer to think of Monads as type constructors m so that Kleisli composition is associative, with the unit being pure:

(>=>) :: (a -> m b) 
      -> (       b -> m c)
      -> (a -> m        c)
(first >=> second) a = do
    b <- first a
    c <- second b
    return c

or if you prefer

(first >=> second) a =
  first a >>= \b -> second b

so the laws are

(one >=> two) >=> three == one >=> (two >=> three) and
k >=> pure == pure >=> k == k