Understanding the concept of difference lists

While I was reading chapter 13 from Real World Haskell i came upon the concept of Difference Lists.
The author says that in an imperative language if we want to append an element to a list, the cost would be O(1) since we would keep a pointer to the last element. However in Haskell, we have immutable objects so we would need every time to traverse the list and append the element at the end, thus 0(n).

Instead of [1,2,3]++[4]++[5]
we could use partial application: (++4).(++5) [1,2,3].

I do not understand how is this more efficient since:
- when i do (++[5]) [1,2,3] it would still be O(n) and then another 0(n) for (++4)

Quote

There are two very interesting things about this approach. The first is that the cost of a partial application is constant, so the cost of many partial applications is linear. The second is that when we finally provide a [] value to
unlock the final list from its chain of partial applications, application 
proceeds from right to left. This keeps the left operand of (++) small, and so 
the overall cost of all of these appends is linear, not quadratic

I understand that this approach would be eager, so instead of keeping thunks of yet to be applied methods we keep the left operand small as the author says, but .... we still perform a traverse on each append.

Given a list: [1...100] and wanting to append 1,2 i would still traverse it 2 times since I would :

f(f([1..100]))= f([1..100,1]) - traversed 1 time and appended 1
[1..100,1,2] -traversed the second time to append 2

Can someone illuminate me on how is this more efficient in time complexity? (becausespace-complexity is due to no more thunks, like foldl')

P.S

I am aware of the canonical answer and I have also read this chapter which I found very helpful.
I know that you can achieve O(1) complexity by appending to the left using :, but it wouldn't be similar to ++.

If i use f=(:) on a= [1,2,3] :
(f 4 . f 5) $ a
I could say that i had 0(1) efficiency on each append since i always appended on the to the left , but i would not get [1,2,3,4,5] , i would get [5,4,1,2,3] , so how is in this case a difference list more efficient for the unitary operation of appending one element ?

Solution

You need to be more careful with times, i.e. when this or that thing is happening.

Instead of starting with a list [1,2,3], with different lists we start with

f1 = ([1,2,3] ++)

Then to "add" 4, 5, to the end of the growing difference list, we have

f2 = f1 . ([4] ++)
f3 = f2 . ([5] ++)

Each such addition into the end of the growing difference list is O(1).

When we're finally done building it, we convert it into a "normal" list by application

xs = f3 []    -- or f3 [6..10] or whatever

Then, carefully, we get

xs = ((([1,2,3] ++) . ([4] ++)) . ([5] ++)) []
   =  (([1,2,3] ++) . ([4] ++)) ( ([5] ++)  [] )
   =   ([1,2,3] ++) ( ([4] ++)  ( ([5] ++)  [] ))
   =     1:2:3:     (   4  :    (   5  :    [] ))

by the definition of (++).

A canonical answer: Why are difference lists more efficient than regular concatenation?

Even a1 = (++ [4]) [1..] by itself is an O(1) operation, as is a2 = (++ [5]) a1 and a3 = (++ [6]) a2, because Haskell is lazy, and thunk creation is O(1).

It's only when we access the final result, that the overall operation becomes quadratic because accessing the ++ structure doesn't rearrange it -- it remains left-nested, so quadratic on repeated access from the top.

The conversion to normal list by application of left-nested . structure to [] rearranges that structure internally into the right-nested $ structure, as is explained in the canonical answer, so repeated access to such structure from the top is linear.

So the difference is between ((++ [5]) . (++ [4])) [1,2,3] (bad) and ((([1,2,3] ++) . ([4] ++)) . ([5] ++)) [] (good). Building the function chain ((++ [4]) . (++ [5])) in itself is linear, yes, but it creates a structure that is quadratic to access in full.

But ((([1,2,3] ++) . ([5] ++)) . ([4] ++)) [] becomes ([1,2,3] ++) (([5] ++) (([4] ++) [])).