Can someone explain the logic behind this permutations generation code?

Frnakly, it looks like black magic to me. Granted, my knowledge of Haskell is fairly limited, but I can understand basic principles. Just because it makes matters that much easier I try to work through a test input, but in this case it still doesn't make sense.

Background: About a year ago I posted this question (Are recursive calls in my "permutations with repetition" code accumulated to clog the RAM?) about a piece of permutation generation code I was trying to get working. I received quite a lot of help from this wonderful community and went on with my project. However, there was one last answer I didn't think much of at the time, from user "peter pun" that I revisited recently.

Part of the answer, the important one, that generates permutations is this

permutationsNub :: Eq a => [a] -> [[a]]
permutationsNub = foldr (concatMap . insert) [[]]
    where insert y = foldr combine [[y]] . (zip <*> tail . tails) -- tails through importing Data.List
              where combine (x, xs) xss = (y : x : xs) :
                        if y == x then [] else map (x :) xss

Allow me to reform the function in a way that makes more sense for me, so that I can see every argument and refer to it, because of partial application in the original and variable names that confused me. Please tell me if the interpretation is correct.

permutationsNub :: Eq a => [a] -> [[[a]]]
permutationsNub digits = scanr (concatMap . insert) [[]] digits
    where insert digit perms = foldr combine [[digit]] $ (zip <*> tail . tails) perms -- zip will produce  tuples of the form (a, [a])
              where combine (permX, tailOfPerms) digitAsDoubleList = (digit : permX : tailOfPerms) : 
                      if digit == permX then [] else map (permX :) digitAsDoubleList

I replaced the main foldr function with scanr (the output type also) to see intermediate steps, so if I run permutationNub [2,7,7,8] the result is [[[2,7,7,8],[7,2,7,8],[7,7,2,8],[7,7,8,2],[2,7,8,7],[7,2,8,7],[7,8,2,7],[7,8,7,2],[2,8,7,7],[8,2,7,7],[8,7,2,7],[8,7,7,2]],[[7,7,8],[7,8,7],[8,7,7]],[[7,8],[8,7]],[[8]],[[]]] which gives some insight.

So, I have to ask:

First of all, why is applicative notation (<*>) used next to zip? Is it dictated by the use of point free notation? How do applicatives creep into this?
My renaming of variables so that they make sense to me seems fine, because the function works, aka produces the same output as the original. However, some things are strange, e.g how can digit ever be equal to permX as per the comparison check in the combine function? The former is an Int and the latter is a [Int], otherwise cons-ing like digit:permX:... in the same function wouldn't work, right? For example, if we call permutationNub [2,7,7,8], on the second pass of the main foldr (or scanr) function, digit will be 7 and permX will [8], so how can these two ever be equal?
It gets stranger from here. Well, both of them can then be cons-ed to tailOfPerms, which is an [[Int]], but then this whole thing, enclosed in brackets is cons-ed to an ... [[Int]]? Again?

Actually, map (permX :) digitAsDoubleList doesn't seem to work (yet, somehow it does...) because we're trying to map [Int] to a [[Int]], as is the result, and then cons to that the previous [[Int]], which I don't see how it could fit. For example, on the second pass of the permutationNub, with 7 as the next digit, picked by the main foldr/scanr function, first pass of insert` will be (7:[8]:[]):[[8]:[[7]]], which I can't make sense of.

Note: As a higher level explanation the author of the code says The basic idea is that, given the unique permutations of a list's tail, we can compute the unique permutations of the whole list by inserting its head into all of the tail's permutations, in every position that doesn't follow an occurrence of the head (to avoid creating duplicates). So, as an example, let's say that current permutations of the tail are [[7,8],[8,7]] (see pass 2) and the next digit is 7. So, the algorithm will insert 7 at the beginning of the first permutation (producing [7,7,8], then at the end (producing [7,8,7] and then only at the beginning of the second permutation (producing [8,7,7])? How can it tell that it doesn't need to elsewhere? What does the author mean by in every position that doesn't follow an occurrence of the head?

If anyone can shed some light into this piece of coding wizardry, please do so!

Solution

This algorithm is just a modification of the classic insertion-based one, which you can find for example in Rosetta Code. To my surprise, despite its simplicity, I wasn't able to find it mentioned anywhere.

First of all, for the insertions, instead of using explicit recursion (considered harmful), I simulated, somewhat clumsily, a paramorphism for lists. Then I found that, if we stop producing insertions after finding an occurrence of the value to be inserted, we can prevent the creation of duplicates while at the same time producing every permutation. I hope the following code clarifies things:

parafoldr :: ((a, [a]) -> b -> b) -> b -> [a] -> b
parafoldr f y xs = foldr f y $ zip xs $ tail $ tails xs

insertions :: a -> [a] -> [[a]]
insertions y = parafoldr combine [[y]]
    where combine (x, xs) xss = (y : x : xs) : map (x :) xss
--or equivalently
insertions y xs = (y : xs) : (parafoldr combine [] xs)
    where combine (x, xs) xss = (x : y : xs) : map (x :) xss

permutations :: [a] -> [[a]]
permutations = foldr (concatMap . insertions) [[]]

insertions' :: Eq a => a -> [a] -> [[a]]
insertions' y = parafoldr combine [[y]]
    where combine (x, xs) xss = (y : x : xs) :
              if x == y then [] else map (x :) xss
--or equivalently
insertions' y xs = (y : xs) : (parafoldr combine [] xs)
    where combine (x, xs) xss
              | x == y    = []
              | otherwise = (x : y : xs) : map (x :) xss

permutationsNub :: Eq a => [a] -> [[a]]
permutationsNub = foldr (concatMap . insertions') [[]]

It's not too hard to show by induction that:

given the correctness of permutations, permutationsNub produces every permutation that permutations produces and, of course, no other lists
the permutations produced by permutationsNub are pairwise different

But actually, I arrived at this algorithm indirectly. Permutations fit naturally in a relational setting (look at the answer's end for references). In the following I use an informal pseudo-Haskell notation to write about relations. I denote by a ~> b the type of relations from a to b. I write r(x, y) when r relates x to y. When defining a relation, I suppose that the relation defined is the least one satisfying the given conditions. I denote by relfoldr (relunfoldr) the operation of relational catamorphism (anamorphism) for lists and by converse (guess what) the converse operation.

So consider the following relations:

permutation :: [a] ~> [a]; permutation(x, y) if y is a permutation of x
insertion :: Maybe (a, [a]) ~> [a]; insertion(Nothing, []) and insertion(Just (x, xs), xs') if xs' results from inserting x somewhere in xs
insertion' same as insertion but with "xs' results from inserting x somewhere in xs but not after an occurrence of it "
selection :: [a] ~> Maybe (a, [a]); selection([], Nothing) and selection(xs, Just (x, xs')) if xs' results from deleting an occurrence of x in xs
selection' same as selection but with "xs' results from deleting the first occurrence of x in xs"

Notice that permutation == converse permutation, insertion == converse selection and insertion' == converse selection'. Knowing that relfoldr (converse r) == converse (relunfoldr r), I proceeded roughly as follows:

permutation ==
relfoldr insertion ==
converse (relunfoldr selection) ==
converse (relunfoldr selection') ==
relfoldr insertion'

The first equality is known. Because of the symmetry of permutation we have permutation == relunfoldr selection, which helps us justify that relunfoldr selection == relunfoldr selection' and thus the third equality.

Now, with a bit of imagination, we can see that:

relunfoldr selection gives the classic selection-based algorithm
relunfoldr selection' gives a similar selection-based algorithm that doesn't create duplicates
relfoldr insertion gives the classic insertion-based algorithm (named permutations above)
relfoldr insertion' gives a similar insertion-based algorithm that doesn't create duplicates (named permutationsNub above)

Some advantages of the insertion-based no-duplicate algorithm over the selection-based one are that:

the latter needs some kind of memory to check if a selection has already been made, which makes it less efficient and possibly less general (depending on the memory's type, it may require Ord a, Hashable a or Int instead of just Eq a)
the former has a simple implementation using foldr

Before answering the old question, I made an attempt to simulate several things from REL in CPO and implement them in Haskell but I found out that there were some limitations and a lot of details that I needed to work out. I was also interested in properly handling infinity via laziness but some tricks I tried, like diagonalization/dovetailing and using NonEmpty lists, didn't work and I wasn't sure about the appropriate theoretical framework either. I might revisit this area some time, if I find the time/energy/motivation.

Finally, here are a few references:

The classic bananas paper treats recursion schemes (and more) in CPO.
The book Algebra of Programming focuses on the relational setting. In pages 130-131 it touches on permutations and derives the classic insertion-based algorithm.
The book Algorithm Design with Haskell is more practically-oriented but still gives an emphasis on equational reasoning. Permutations have a distinct presence in it and both selection/insertion-based basic algorithms are presented.
In the lecture notes "Algebraic and Coalgebraic Methods in the Mathematics of Program Construction" you can find some quite pedagogical expositions of related matters. Chapter 5 takes the functional approach (SET, CPO) while chapter 8 adopts the relational one.