time and space complexity of libstdc++ std::inplace_merge (__merge_without_buffer)

What is the time and space complexity of the algorithm implemented by __merge_without_buffer called by libstdc++ std::inplace_merge? More specifically, I'm interested in these:

Upper limit for the recursion depth. Is it <= 2 * ceil(log2(n)) ?
Upper limit for the number of comparisons. Is it <= 2 * ceil(log2(n)) * n ?
Upper limit for the number of swaps. Is it <= 2 * ceil(log2(n)) * n ?

Ideally the algorithm is a famous one described in a famous paper, and the paper contains these upper limits as proofs. However, the libstdc++ source code doesn't specify any source.

For the is it <= ... questions above, I've got these formula conjectures by intuition, running the C program instrumented with counting for many inputs, and checking the upper limit for those inputs. However, I wasn't able to prove any of my conjectures in the general case. For the recursion depth formula, changing the constant from 2 to 1.5 makes it false, I've found some counterexamples.

Please note that I'm not interested in O(...) classes, but I want an actual upper limit, with a specific constant (hopefully 2 or less). I also need proofs for the upper limit formulas. A citation is enough as the proofs.

Based on comments and documentation of the implementations I think that both the number of comparisons and the number of swaps is O(log2(n) * n). However, there is no proof, this is not an upper limit formula, and the formula for the recursion depth is missing.

It's quite easy to prove that in a regular (non-inplace) merge algorithm, there is no recursion, the number of comparisons is <= n - 1, there are no swaps, and the number of copies is n. However, I'm specifically interested the inplace merge algorithm.

Above n is the total number of elements in the input/output array (== n1 + n2 == len1 + len2 == __len1 + __len2).

When calculating the number of swaps used by the rotate function, use 1 less than the total number of elements passed to the rotate function as an upper bound.

Here are some implementations of this algorithm:

C++: __merge_without_buffer function in libstdc++, called by std::inplace_merge.
Java: merge, reimplementation of the C++ __merge_without_buffer function in Java.
C: ip_merge_, simple reimplementation of the C++ __merge_without_buffer function in C.
C: ip_mergesort: simple reimplementation of the C++ __merge_without_buffer function in C, based on ip_merge_ above, and an in-place mergesort implementation with an interface compatible with qsort(3).

Please note that the implementations above are equivalent for recursion depth, number of comparisons and number of swaps (as defined above).

This blog post explains a different inplace mergesort. That algorithm swaps adjacent memory regions of the same size only.

The article Practical In-Place Merging (also explained in a C++ code comment here) explains a different inplace merge algorithm.

For completeness, I copy the C implementation here:

#define SWAP(type, a, b) \
    do { type t=(a);(a)=(b);(b)=t; } while (0)

static void reverse_(int* a, int* b)
{
    for ( --b; a < b; a++, b-- )
       SWAP(int, *a, *b);
}
static int* rotate_(int* a, int* b, int* c)
/* swap the sequence [a,b) with [b,c). */
{
    if (a != b && b != c)
     {
       reverse_(a, b);
       reverse_(b, c);
       reverse_(a, c);
     }
    return a + (c - b);
}

static int* lower_bound_(int* a, int* b, const int key)
/* find first element not less than @p key in sorted sequence or end of
 * sequence (@p b) if not found. */
{
    int i;
    for ( i = b-a; i != 0; i /= 2 )
     {
       int* mid = a + i/2;
       if (*mid < key)
          a = mid + 1, i--;
     }
    return a;
}
static int* upper_bound_(int* a, int* b, const int key)
/* find first element greater than @p key in sorted sequence or end of
 * sequence (@p b) if not found. */
{
    int i;
    for ( i = b-a; i != 0; i /= 2 )
     {
       int* mid = a + i/2;
       if (*mid <= key)
          a = mid + 1, i--;
     }
    return a;
}

static void ip_merge_(int* a, int* b, int* c)
/* inplace merge. */
{
    int n1 = b - a;
    int n2 = c - b;

    if (n1 == 0 || n2 == 0)
       return;
    if (n1 == 1 && n2 == 1)
     {
       if (*b < *a)
          SWAP(int, *a, *b);
     }
    else
     {
       int* p, * q;

       if (n1 <= n2)
          p = upper_bound_(a, b, *(q = b+n2/2));
       else
          q = lower_bound_(b, c, *(p = a+n1/2));
       b = rotate_(p, b, q);

       ip_merge_(a, p, b);
       ip_merge_(b, q, c);
     }
}

Solution

Upper limit for the recursion depth.

It's relatively easy to prove that that total item size (b-a and c-b) of each recursive call is between floor((c-a)/4) and ceil((c-a)*3/4), and their sum is c-a.

From this it's easy to prove that the recursion depth is O(log(c-a)). More specifically, it's less than log(c-a)/log(4/3)+1.

Upper limit for the number of swaps.

It's easy to prove that the total number of swaps needed by rotate_(p, b, q) is at most q-p. It's also easy to prove that q-p <= ceil((c-a)*3/4).

To get the total number of swaps, we have to count all the rotate_ calls in the main call and in the recursive call. It's reletively easy to prove that the total will be O(nlog(n)), and after that it's possible to prove that it's at most 3nlog2(n)*.

Upper limit for the number of comparisons.

It's easy to prove that the number of comparisons in the upper_bound_ and lower_bound_ calls is at most ceil(log2(floor((c-a)/2)+1)). That's less than the number of swaps in the rotate_ call. Thus the number of comparisons is less than the number of swaps. Actually it's much less, only a tiny bit more than O(n), but wasn't able to prove it.

Number of swaps in a mergesort.

I used the following algorithm to implement an in-place mergesort of n items:

void ip_mergesort_(int *base, int n) {
  int a, b, d;
  for (d = 1; d < n; d <<= 1) {
    for (a = 0; a < n - d; a = b) {
      b = a + (d << 1);
      ip_merge_(base + a, base + (a + d), base + (b > n ? n : b));
    }
  }
}

I was able to prove that the number of swaps in this mergesort is O(n*log(n)*log(n)). I was able to measure (but not prove) these bounds:

If n <= 1, then 0.
If n == 2, then at most 1.
If n >= 2, then less than 0.75 * n * log2(n) * log2(n).

Number of comparisons in a mergesort.

I was able to prove that the number of comparisons in the mergesort above is O(n*log(n)*log(n)). I was able to measure (but not prove) these bounds:

If n <= 1, then 0.
If n == 2, then at most 1.
If n >= 2, then less than 0.5 * n * log2(n) * log2(n).
If 2 <= n <= 2**32, then less than 1.9683 * n * log2(n).
If 2 <= n <= 2**64, then less than 1.9998 * n * log2(n).
If 2 <= n <= 2**128, then less than 2.0154 * n * log2(n).
If 2 <= n <= 2**256, then less than 2.0232 * n * log2(n).
If 2 <= n <= 2**512, then less than 2.0270 * n * log2(n).

More details of the analysis and measurements here: C code with comments, Python script doing the measurements.