Search code examples
algorithmsortingbubble-sort

Number of required passes to fully sort using BubbleSort?


I'm investigating the mathematics behind the number of passes required to sort each of the possible combinations of integers [1,n] in an array[n].

For example, with n = 3, there are 3! = 6 possible permutations of the numbers:

1,2,3 - 1,3,2 - 2,1,3 - 2,3,1 - 3,1,2 - 3,2,1.
  • One of these initial permutations requires k = 0 passes (1,2,3) to sort the array into ascending order;
  • Three of them require k = 1 pass (1,3,2 - 2,1,3 - 3,1,2) and
  • Two require k = 2 passes (2,3,1 - 3,2,1).

Basically, I want to be able to derive mathematically the set of numbers of passes {k} for a given n.

For n = 4, the number of initial permutations, P, that require k passes is P(n,k) = 1,7,10,6 for k = 0,1,2,3.

There is of course only ever 1 initial permutation for k = 0 (already in ascending order), ie P(n,0) = 1, and the number of initial permutations for the highest value of k (which is n-1) is k!, ie P(n,n-1) = (n-1)! . Or, at least I think so...

I feel like this is simpler than I'm making it and involves factorial formulae.


Solution

  • An algorithm for generating permutations is Heap's algorithm. This code is a brute-force method to calculate the permutations of n objects. For each configuration, the number of passes is the maximum length any element is from it's sorted position, O(n). Given n, this gives all the the P(n, k) by doing a histogram; it's running time is exponential in n, (in C.)

    #include <stdlib.h> /* EXIT */
    #include <stdio.h>  /* printf */
    #include <assert.h> /* assert */
    #include <errno.h>  /* errno, ERANGE */
    
    typedef void (*PermuteFunc)(const size_t a_size);
    
    unsigned a[12];
    const size_t a_max = sizeof a / sizeof *a;
    
    /* https://en.wikipedia.org/wiki/Heap%27s_algorithm#cite_note-3 */
    static void heaps_r(const size_t a_size, const unsigned k,
        const PermuteFunc func) {
        size_t i, j;
        assert(k && a_size);
        if(k == 1) { func(a_size); return; }
        for(i = 0; i < k; i++) {
            heaps_r(a_size, k - 1, func);
            if(i >= k - 1) continue;
            j = (k & 1) ? 0 : i; /* Odd/even. */
            a[j] ^= a[k-1], a[k-1] ^= a[j], a[j] ^= a[k-1]; /* Swap. */
        }
    }
    
    /* Generates all permutations of size `a_size` and passes them to `func`.
     @return Success. */
    static int heaps(const size_t a_size, const PermuteFunc func) {
        size_t i;
        assert(func);
        if(!a_size || a_size > a_max) return errno = ERANGE, 0;
        for(i = 0; i < a_size; i++) a[i] = i + 1; /* Distinct numbers. */
        heaps_r(a_size, a_size, func);
        return 1;
    }
    
    static unsigned histogram[256]; /* This is good enough, right? */
    static size_t histogram_size = sizeof histogram / sizeof *histogram;
    
    /* @implements PermuteFunc */
    static void print(const size_t a_size) {
        size_t i, bin = 0;
        assert(a && a_size);
        for(i = 0; i < a_size; i++) printf("%d ", a[i]);
    #if 0 /* I misread the question. */
        /* O(n^2) way to calculate the Kendall tau distance. */
        for(i = 0; i < a_size; i++) {
            size_t j;
            for(j = i + 1; j < a_size; j++) if(a[i] > a[j]) bin++;
        }
    #else
        /* Calculate the number of passes bubble-sort needs to make. */
        for(i = 0; i < a_size; i++) {
            size_t passes = abs(a[i] - i);
            if(passes > bin) bin = passes;
        }
    #endif
        if(bin >= histogram_size) {
            fprintf(stderr, "Histogram too small for %d.\n", (unsigned long)bin);
            return;
        }
        histogram[bin]++;
        printf("-> %d\n", bin);
    }
    
    int main(int argc, char **argv) {
        int n;
        size_t k;
        const char *err = 0;
        do {
            if(argc != 2 || (n = atoi(argv[1]), n <= 0))
                { errno = EDOM; err = "Argument needed"; break; }
            if(!heaps(n, &print)) { err = "Heap's"; break; }
            printf("\n");
            for(k = 0; k < histogram_size; k++) if(histogram[k])
                printf("P(%d, %lu) = %u\n", n, (unsigned long)k, histogram[k]);
        } while(0);
        return err ? (perror(err), EXIT_FAILURE) : EXIT_SUCCESS;
    }
    

    Passing 4, I get,

    P(4, 1) = 1
    P(4, 2) = 7
    P(4, 3) = 10
    P(4, 4) = 6
    

    I Googled the Kendall tau distance code and notice that it's the coefficients in expansion of Product_{i=0..n-1} (1 + x + ... + x^i), however the code with the passes of bubble-sort doesn't match any documents.