What is the advantage of using tail recursion here?

I have been reading articles describing how space complexity of quicksort can be reduced by using the tail recursive version but I am not able to understand how this is so. Following are the two versions :

QUICKSORT(A, p, r)
       q = PARTITION(A, p, r)
       QUICKSORT(A, p, q-1)
       QUICKSORT(A, q+1, r)


TAIL-RECURSIVE-QUICKSORT(A, p, r)
   while p < r
      q = PARTITION(A, p, r)
      TAIL-RECURSIVE-QUICKSORT(A, p, q-1)
      p = q+1

(Source - http://mypathtothe4.blogspot.com/2013/02/lesson-2-variations-on-quicksort-tail.html)

As far as I understand , both of these would cause recursive calls on both the left and right half of the array. In both the cases , only one half would processed at a time and therefore at any time only one recursive call would be using the stack space. I am unable to see how the tail recursive quicksort saves space.

The pseudo code above is taken from the article - http://mypathtothe4.blogspot.com/2013/02/lesson-2-variations-on-quicksort-tail.html The explanation provided in the article confuses me even more -

Quicksort partitions a given sub-array and proceeds to recurse twice; one on the left-sub-array and one on the right. Each of these recursive calls will require its own individual stream of stack space. This space is used to store the indexing variables for the array at some level of recursion. If we picture this occurring from beginning to end of execution, we can see that the stack space doubles at each layer.

So how does Tail-Recursive-Quicksort fix all of this?

Well, instead of recursing on two sub-arrays, we now only recurse on one. This eliminates the need for doubling stack space at every layer of execution. We get around this problem by using the while loop as an iterative control that performs the same task. Instead of needing the stack to save sets of variables for two recursive calls, we simply alter the same set of variables and use the single recursive call on new variables.

I don't see how the stack space doubles at every layer of execution in the case of a regular quicksort.

Note :- There is no mention of compiler optimization in the article.

Solution

A tail recursive function call allows the compiler to perform a special optimization which it normally can not with regular recursion. In a tail recursive function, the recursive call is the very last thing to be executed. In this case, instead of allocating a stack frame for each call, the compiler can rework the code to simply reuse the current stack frame, meaning a tail-recursive function will only use a single stack frame as opposed to hundreds or even thousands.

This optimization is possible because the compiler knows that once the tail recursive call is made, no previous copies of variables will be needed, because there is no more code to execute. If, for instance, a print statement followed a recursive call, the compiler would need to know the value of the variable to be printed after the recursive call returns, and thus the stack frame cannot be reused.

Here's the wiki page if you'd like more information on how this "space saving" and stack reuse actually works, along with examples: Tail Call

Edit: I didn't explain how this applies to quicksort, did I? Well, some terms are thrown around in that article which make everything all confusing (and some of it is just plain wrong). The first function given (QUICKSORT) makes a recursive call on the left, a recursive call on the right, and then exits. Notice that the recursive call on the right is the very last thing that happens in the function. If the compiler supports tail recursive optimization (explained above), only the left calls create new stack frames; all the right calls just reuse the current frame. This can save some stack frames, but can still suffer from the case where the partitioning creates a sequence of calls where tail recursion optimization doesn't matter. Plus, even though right-side calls use the same frame, the left-side calls called within the right-side calls still use the stack. In the worst case, the stack depth is N.

The second version described is not a tail recursive quicksort, but rather a quicksort where only the left sorting is done recursively, and the right sorting is done using the loop. In fact, this quicksort (as previously described by another user) cannot have the tail recursion optimization applied to it, because the recursive call is not the last thing to execute. How does this work? When implemented correctly, the the first call to quicksort is the same as a left-side call in the original algorithm. However, no right-side recursive calls are even called. How does this work? Well, the loop takes care of that: instead of sorting "left then right", it sorts the left with a call, then sorts the right by continually sorting only the lefts of the right. It's really ridiculous sounding, but it's basically just sorting so many lefts that the rights become single elements and don't need to be sorted. This effectively removes the right recursion, making the function less recursive (pseudo recursive, if you will). However, the real implementation does not choose just the left side each time; it chooses the smallest side. The idea is still the same; it basically only does a recursive call on one side instead of both. Picking the shorter side will ensure that the stack depth can never be larger than log2(N), which is the depth of a proper binary tree. This is because the shorter side is always going to be at most half the size of our current array section. The implementation given by the article does not ensure this however, because it can suffer from the same worst-case scenario of "left is the whole tree". This article actually gives a pretty good explanation of it if you're willing to do more reading: Efficient selection and partial sorting based on quicksort