Search code examples
c++algorithmsortingquicksort

Partition for randomised quicksort (with few unique elements)


I've been tasked to write a partition function for a randomised quicksort with few elements (optimising it by including 3 partitions instead of 2). I've tried implementing my version of it, and have found that it doesn't pass the test cases.

However, by using a classmates' version of partition, it seems to work. Conceptually, I don't see the difference between his and mine, and I can't tell what is it with my version that causes it to break. I wrote it with the concept as him (I think), which involves using counters (j and k) to partition the arrays into 3.

I would greatly appreciate anybody that could point out why mine doesn't work, and what I should do to minimise the chances of these again. I feel like this learning point will be important to me as a developer, thank you!

For comparison, there will be 3 blocks of code, the snippet directly below will be my version of partition, following which will be my classmates' version and lastly will be the actual algorithm which runs our partition.

My version (Does not work)

vector<int> partition2(vector<int> &a, int l, int r) {
  int x = a[l];
  int j = l;
  int k = r;

  vector<int> m(2);

  // I've tried changing i = l + 1
  for (int i = l; i <= r; i++) {
    if (a[i] < x) {
      swap(a[i], a[j]);
      j++;
    }

    else if (a[i] > x) {
      swap(a[i], a[k]);
      k--;
    }
  }

  // I've tried removing this
  swap(a[l], a[j]);

  m[0] = j - 1;
  m[1] = k + 1;

  return m;
}

My classmates' (which works)

vector<int> partition2(vector<int> &a, int l, int r) {
    int x = a[l];
    int p_l = l;
    int i = l;
    int p_e = r;
    vector<int> m(2);
    while (i <= p_e) {
        if (a[i] < x) {
            swap(a[p_l], a[i]);
            p_l++;
            i++;
        } else if (a[i] == x) {
            i++;
        } else {
            swap(a[i], a[p_e]);
            p_e -= 1;
        }
        m[0] = p_l - 1;
        m[1] = p_e + 1;
    }
    return m;
}

Actual quick sort algorithm

void randomized_quick_sort(vector<int> &a, int l, int r) {
  if (l >= r) {
    return;
  }

  int k = l + rand() % (r - l + 1);
  swap(a[l], a[k]);
  vector<int> m = partition2(a, l, r);

  randomized_quick_sort(a, l, m[0]);
  randomized_quick_sort(a, m[1], r);
}

Solution

  • The difference between the two functions for three-way partition is that your code advances i in each pass through the loop, but your classmate's function advances i only when the value at position i is less or equal to the pivot.

    Let's go through an example array. The first value, 3, is the pivot. The letters indicate the positions of the variables after each pass through the loop.

    j               k
    3   1   5   2   4
        i
    

    The next value is smaller: swap it to the left side and advance j:

        j           k
    1   3   5   2   4
            i
    

    The next value, 5, is greater, so it goes to the right:

        j       k
    1   3   4   2   5
                i
    

    That's the bad move: Your i has now skipped over the 4, which must go to the right part, too. Your classmate's code does not advance the i here and catches the 4 in the next pass.

    Your loop has some invariants, things that must be true after all passes:

    • All items with an index lower than i are smaller than the pivot.
    • All items with an index greater than k are greater than the pivot.
    • All items with an index from j to i - 1 are equal to the pivot.
    • All items from i to k have not yet been processed.

    You can also determine the loop conditions from that:

    • The pivot is the leftmost element by definition, because the quicksort function swaps it there. It must belong to the group of elements that are equal to the pivot, so you can start your loop at l + 1.
    • All items starting from k are already in the correct part of the array. That means that you can stop when i reaches k. Going further will needlessly swap elements around inside the "greater than" partition and also move k, which will return wrong partition boundaries.