Search code examples
algorithmmatrixlevenshtein-distance

Does an algorithm exist to find the minimal sum of non-intersecting values in a 2D array?


I'm looking for a fast algorithm to determine a particular minimal property of a given 2D array - the sum of the smallest values that have no rows or columns in common. I'm sure this must have a name but I have no idea what it's called.

I have a string-matching system that will split an input string on spaces and compare it to a corpus of search values (also split in spaces), and return the a matrix of distances between the tokens within each string, and I want to reduce this to a single aggregate distance by taking the minimum combination of distances that don't re-use any input/output token combination.

Examples:

{ 1, 2 }   => 5 (either 1+4, or 3+2)
{ 3, 4 }

{ 0, 2 }   => 6 (because 2+4 < 0+8)
{ 4, 8 } 

{ 1, 0, 0 }
{ 0, 1, 0 } => 0
{ 0, 0, 1 }

{ 2, 3, 4 }
{ 3, 2, 4 } => 6 (2+2+2)
{ 4, 3, 2 } 

The naive algorithm I've been using until now looks like this (C#):

public static int Minimux(this int[,] array) {
  var xUsed = new bool[array.GetLength(0)];
  var yUsed = new bool[array.GetLength(1)];
  var xMax = array.GetLength(0);
  var yMax = array.GetLength(1);
  var minima = new List<int>();
  var limit = Math.Min(xMax, yMax);
  int xMin = 0, yMin = 0;
  while (minima.Count < limit) {
    var vMin = Int32.MaxValue;
    for (var x = 0; x < xMax; x++) {
      for (var y = 0; y < yMax; y++) {
        if (xUsed[x] || yUsed[y] || array[x, y] >= vMin) continue;
        vMin = array[x, y];
        xMin = x;
        yMin = y;
      }
    }
    xUsed[xMin] = true;
    yUsed[yMin] = true;
    minima.Add(vMin);
  }
  return (minima.Sum());
}

It basically does an array sweep and as it finds each minimal value, it marks that row/column combination as 'used' so it won't be considered again - and once there's as many minima in the list as there are elements in the shortest array dimension, it returns the sum of those minima.

The problem is that it breaks down on cases like this:

{ 0, 0, 0 }
{ 0, 0, 0 } => 3 (when it should be returning 1)
{ 1, 2, 3 } 

By the time the sweep reaches the last row, it's already marked columns 0 and 1 as 'used' and so the minimum unused value in row 2 is 3 when it should actually use the 1

Does a standard algorithm exist for performing this operation?


Solution

  • Yes, there exists a standard algorithm that solves exactly this problem. Its name is Hungarian algorithm.