How to reconstruct partitions in the Karmarkar-Karp heuristic multi-way partitioning algorithm?

I'm trying to implement the k-partitioning version of the Karmarkar-Karp heuristic number partitioning algorithm. But I'm struggling with its second phase where number partitions are reconstructed from the resulting difference-set.

The only source that I can find that thoroughly describes the second phase with some pseudocode is page 58 of the thesis: Load balancing of multiphysics simulations by multi-criteria graph partitioning.

Given the multiset S = [1,7,5,10,9,3] to partition it into three (k=3) subsets the algorithm goes through 2 phases.

Phase 1:

It first sorts S in descending order:

S = [10,9,7,5,3,1]

Then, each number in S is transformed into a tuple of size k, the first entry being the number itself, and the others being zeros:

M = [[10,0,0],[9,0,0],[7,0,0],[5,0,0],[3,0,0],[1,0,0]]

This creates the matrix M.

At each step, (a, b, c) and (x, y, z) are removed from M , and a new tuple is formed in place: E := (a + x, b + y, c + z). The tuple is normalized by its minimum and sorted in descending order, before inserting it back in M in order. At each step the algorithm memorizes the array of old values ([a, x], [b, y], [c, z]) and the number used for normalization by putting them into two stacks.

M = [[10,0,0],[9,0,0],[7,0,0],[5,0,0],[3,0,0],[1,0,0]]
Old values: []
Norm. numbers: []


M = [[10,9,0],[7,0,0],[5,0,0],[3,0,0],[1,0,0]]
Old values: [([10,0],[0,0],[0,9])]
Norm. numbers: [0]


M = [[5,0,0],[3,0,0],[3,2,0],[1,0,0]]
Old values: [([10,0],[9,0],[0,7]), ([10,0],[0,0],[0,9])]
Norm. numbers: [7,0]


M = [[5,3,0],[3,2,0],[1,0,0]]
Old values: [([5,0],[0,0],[0,3]), ([10,0],[9,0],[0,7]), ([10,0],[0,0],[0,9])]
Norm. numbers: [0,7,0]


M = [[2,2,0],[1,0,0]]
Old values: [([5,0],[3,2],[0,3]), ([5,0],[0,0],[0,3]), ([10,0],[9,0],[0,7]), ([10,0],[0,0],[0,9])]
Norm. numbers: [3,0,7,0]


M = [[1,1,0]]
Old values: [([2,0],[2,0],[0,1]), ([5,0],[3,2],[0,3]), ([5,0],[0,0],[0,3]), ([10,0],[9,0],[0,7]), ([10,0],[0,0],[0,9])]
Norm. numbers: [1,3,0,7,0]

Phase 2:

The second phase builds partitions from the resulting difference-set, memorized old values and normalization numbers.

At each step it pops the memorized old values array and normalization number:

M = [[1],[1],[0]]
Old values: [[2,0],[2,0],[0,1]] 
Norm. number: 1

For each tuple from the old values array it calculates a value to search in M:

[2,0]: 2 + 0 - 1 = 1

Then search for a partition in M that contains that value and replaces the value with the tuple:

[[2,0],[1],[0]]


[2,0]: 2 + 0 - 1 = 1
=> [[2,0],[2,0],[0]]

At each iteration it never puts two tuples into the same partition. So [0,1] goes into the last partition:

[0,1]: 0 + 1 - 1 = 0
=> [[2,0],[2,0],[0,1]]

And so on:

M = [[2,0],[2,0],[0,1]]
Old values: [[5,0],[3,2],[0,3]] 
Norm. number: 3

M = [[0,0,5],[0,3,2],[1,0,3]]
Old values: [[5,0],[0,0],[0,3]] 
Norm. number: 0

Here comes the problem:

M = [[0,0,0,5],[3,2,0,0],[1,0,0,3]]
Old values: [[10,0],[9,0],[0,7]] 
Norm. number: 7

[10,0]: 10 + 0 - 7 = 3
=> [[0,0,0,5],[10,0,2,0,0],[1,0,0,3]]

[9,0]: 9 + 0 - 7 = 2
=> ???

I've already put the [10,0] tuple into the [3,2,0,0] partition at this iteration, so I can't put another tuple into it. But M doesn't have another partition that contains the value 2.

If I allow to put several tuples into the same partition when such a situation arises then I get resulting partitions that are far from optimal:

[[0,0,0,0,5,7],[0,0,0,0,0,0,10,9],[1,0,0,3]]

I think I can solve the problem using Maximum Bipartite Matching but it feels like a dirty hack.

Am I missing some important step or is there a better way to reconstruct partitions?

Conclusion:

David Eisenstat provided an elegant and fast OOP implementation of the algorithm in C++.

But for my job I needed it rewritten in PHP. A direct translation of the implementation proved to be memory inefficient and slow. It was memory inefficient because of wrapping of every number into a Subset object. And it was slow because sorting an array of objects in PHP is an order of magnitude slower than sorting an array of numbers. For example partitioning of array of 500000 numbers in 100 subsets using this approach took 138 seconds while my implementation of the Greedy algorithm did it in 9 seconds.

So, after two days in a profiler I rewrote the PHP implementation using arrays. It looks ugly, but it performs better than my implementation of the Greedy algorithm when k is low and not so drastically slow when k is high:

K: 2
Greedy: 330 (seconds)
Karmarkar–Karp: 6

K: 4
Greedy: 177
Karmarkar–Karp: 6

K: 8
Greedy: 85
Karmarkar–Karp: 6

K: 16
Greedy: 43
Karmarkar–Karp: 8

K: 32
Greedy: 21
Karmarkar–Karp: 11

K: 64
Greedy: 11
Karmarkar–Karp: 20

K: 128
Greedy: 8
Karmarkar–Karp: 33

K: 256
Greedy: 3
Karmarkar–Karp: 61

I hope both the David Eisenstat answer and my solution prove useful.

<?php

function greedy(array $array, int $k) : array
{
    $result = new \Ds\PriorityQueue();

    for ($i = 0; $i < $k; $i++)
    {
        $result->push([], 0);
    }

    sort($array, SORT_NUMERIC);

    while (!empty($array))
    {
        $number = array_pop($array);
        $smallestSumArray = $result->pop();

        $smallestSumArray [] = $number;
        $sum = array_sum($smallestSumArray);
        $result->push($smallestSumArray, -$sum);
    }

    return $result->toArray();
}

function karmarkarKarp(array $array, int $k) : array
{
    $id = PHP_INT_MIN;

    $heap = new \Ds\PriorityQueue();
    $idToNumbers = new \Ds\Map();
    $idToSum = new \Ds\Map();

    /**
     * Convert every number into an ID that is connected with a numbers array using $idToNumbers map
     * and with a sum using $idToSum map
     */
    foreach ($array as $number)
    {
        $idToNumbers[$id] = new \Ds\Stack([$number]);
        $idToSum[$id] = $number;
        $heap->push([$id], $number);
        ++$id;
    }
 
    //Partitioning:

    $sumToId = [];

    while ($heap->count() > 1)
    {
        /** @var array $a */
        $a = $heap->pop();
        /** @var array $b */
        $b = $heap->pop();

        for ($i = 0; $i < $k; $i++)
        {
            $reverseI = $k - 1 - $i;

            if (!isset($a[$i]) && !isset($b[$reverseI])) // Instead of filling k-tuple with zeroes just check that a position is set
            {
                continue;
            }

            if (!isset($a[$i]) || !isset($b[$reverseI]))
            {
                $Ai = $a[$i] ?? $b[$reverseI];
                unset($a[$i], $b[$reverseI]);
                $sum = $idToSum[$Ai];

                isset($sumToId[$sum]) ? $sumToId[$sum] []= $Ai : $sumToId[$sum] = [$Ai];

                continue;
            }

            /** @var int $Ai */
            $Ai = $a[$i];

            /** @var int $Bk */
            $Bk = $b[$reverseI];

            unset($a[$i], $b[$reverseI]);

            $aNumbers = $idToNumbers[$Ai];
            $bNumbers = $idToNumbers[$Bk];

            while (!$bNumbers->isEmpty())
            {
                $aNumbers->push($bNumbers->pop());
            }

            $sum = $idToSum[$Ai] + $idToSum[$Bk];

            $idToSum[$Ai] = $sum;

            isset($sumToId[$sum]) ? $sumToId[$sum] []= $Ai : $sumToId[$sum] = [$Ai];

            $idToNumbers->remove($Bk);
            $idToSum->remove($Bk);
        }

        krsort($sumToId); // It's faster than using usort() to sort $a by sums in $idToSum map

        $a = array_merge(...$sumToId);

        $sumToId = [];

        $difference = $idToSum[$a[0]] - (isset($a[$k -1]) ? $idToSum[$a[$k -1]] : 0);

        $heap->push($a, $difference);
    }

    //Preparing the resulting array:

    /** @var array $last */
    $last = $heap->pop();

    array_walk($last, function (&$item) use ($idToNumbers) {
        /** @var \Ds\Stack $numbersStack */
        $numbersStack = $idToNumbers[$item];

        $item = [];

        while (!$numbersStack->isEmpty())
        {
            $item []= $numbersStack->pop();
        }
    });

    return $last;
}

$randomArray = [];

for ($i = 0; $i < 500000; $i++)
{
    $randomArray []= random_int(1, 6000);
}

for ($k = 2; $k <= 256; $k *= 2)
{
    echo PHP_EOL . 'K: ' . $k;

    $time = time();
    $result = greedy($randomArray, $k);
    echo PHP_EOL . 'Greedy: ' . (time() - $time);

    gc_mem_caches();

    $time = time();
    $result = karmarkarKarp($randomArray, $k);
    echo PHP_EOL . 'Karmarkar–Karp: ' . (time() - $time) . PHP_EOL;

    gc_mem_caches();
}

Solution

I'd implement this algorithm using one phase, adding data structures to track the subsets associated with each sum. In C++:

#include <algorithm>
#include <cassert>
#include <cstdint>
#include <iostream>
#include <list>
#include <memory>
#include <vector>

namespace {

using Number = std::uint64_t;

class Subset {
 public:
  Subset() : numbers_{}, sum_{} {}
  explicit Subset(const Number number) : numbers_{number}, sum_{number} {}

  Subset(Subset&&) = default;
  Subset& operator=(Subset&&) = default;

  const std::list<Number>& numbers() const { return numbers_; }

  Number sum() const { return sum_; }

  void Merge(Subset other) {
    numbers_.splice(numbers_.end(), other.numbers_);
    sum_ += other.sum_;
  }

 private:
  Subset(const Subset&) = delete;
  Subset& operator=(const Subset&) = delete;

  std::list<Number> numbers_;
  Number sum_;
};

std::ostream& operator<<(std::ostream& stream, const Subset& subset) {
  stream << '[';
  if (!subset.numbers().empty()) {
    auto it{subset.numbers().begin()};
    stream << *it;
    for (++it; it != subset.numbers().end(); ++it) {
      stream << ',' << *it;
    }
  }
  stream << ']';
  return stream;
}

struct SubsetSumGreater {
  bool operator()(const Subset& a, const Subset& b) const {
    return a.sum() > b.sum();
  }
};

class Partition {
 public:
  Partition(const Number number, const std::size_t k) : subsets_(k) {
    assert(k > 0);
    subsets_.front().Merge(Subset{number});
  }

  Partition(Partition&&) = default;
  Partition& operator=(Partition&&) = default;

  const std::vector<Subset>& subsets() const { return subsets_; }

  Number difference() const {
    return subsets_.front().sum() - subsets_.back().sum();
  }

  void Merge(Partition other) {
    assert(subsets_.size() == other.subsets_.size());
    auto it{subsets_.begin()};
    auto other_it{other.subsets_.rbegin()};
    while (it != subsets_.end() && other_it != other.subsets_.rend()) {
      it->Merge(std::move(*other_it));
      ++it;
      ++other_it;
    }
    std::sort(subsets_.begin(), subsets_.end(), SubsetSumGreater{});
  }

 private:
  Partition(const Partition&) = delete;
  Partition& operator=(const Partition&) = delete;

  std::vector<Subset> subsets_;
};

std::ostream& operator<<(std::ostream& stream, const Partition& partition) {
  stream << '[';
  if (!partition.subsets().empty()) {
    auto it{partition.subsets().begin()};
    stream << *it;
    for (++it; it != partition.subsets().end(); ++it) {
      stream << ',' << *it;
    }
  }
  stream << ']';
  return stream;
}

struct DifferenceLess {
  bool operator()(const Partition& a, const Partition& b) const {
    return a.difference() < b.difference();
  }
};

Partition KarmarkarKarp(const std::vector<Number>& numbers,
                        const std::size_t k) {
  assert(!numbers.empty());
  assert(k > 0);
  std::vector<Partition> heap;
  heap.reserve(numbers.size());
  for (const Number number : numbers) {
    heap.emplace_back(number, k);
  }
  std::make_heap(heap.begin(), heap.end(), DifferenceLess{});
  while (heap.size() > 1) {
    std::pop_heap(heap.begin(), heap.end(), DifferenceLess{});
    Partition partition{std::move(heap.back())};
    heap.pop_back();
    std::pop_heap(heap.begin(), heap.end(), DifferenceLess{});
    heap.back().Merge(std::move(partition));
    std::push_heap(heap.begin(), heap.end(), DifferenceLess{});
  }
  return std::move(heap.front());
}

}  // namespace

int main() { std::cout << KarmarkarKarp({1, 7, 5, 10, 9, 3}, 3) << '\n'; }