Search code examples
c#arraysmemory-managementout-of-memoryfloyd-warshall

How to handle arrays of extremely large strings in C#


I'm implementing the Floyd-Warshal algorithm, as can be found here.
I'm not just interested in the shortest distance between nodes, but also in the path, corresponding with that distance.

In order to do this, I have modified the algorithm as follows:

double[,] dist = new double[V, V];        // existing line
string[,] connections = new string[V, V]; // new line, needed for remembering the path

...
for (i = 0; i < V; i++){
  for (j = 0; j < V; j++){
    dist[i, j] = graph[i, j];
    connections[i, j] = $"({i},{j})";}} // added: initialisation of "connections"

...
if (dist[i, k] + dist[k, j] < dist[i, j])
{
  dist[i, j] = dist[i, k] + dist[k, j];
  connections[i, j] = connections[i, k] + "-" + connections[k, j]; // Added for remembering shortest path
}

I'm running this algorithm with a snake-like list of locations of one million, all of them simply being added one after the other.

As a result, my connections array looks as follows:

    [0, 0]  "(0,0)"
    [0, 1]  "(0,1)"
    [0, 2]  "(0,1)-(1,2)"
    [0, 3]  "(0,1)-(1,2)-(2,3)"
    [0, 4]  "(0,1)-(1,2)-(2,3)-(3,4)"
    [0, 5]  "(0,1)-(1,2)-(2,3)-(3,4)-(4,5)"
    ...
    [0, 787]  "(0,1)-(1,2)-...(786,787)" // length of +7000 characters
    ...

... at the moment of my OutOfMemoryException (what a surprise) :-)

I would like to avoid that OutOfMemoryException, and start thinking of different techniques:

  • Forcing garbage collection once a variable is not needed anymore (in case this is not yet done)
  • "Swapping" very large objects between memory and hard disk, in order to get more memory access.

I believe the second option being the most realistic (don't kill me if I'm wrong) :-) Is there a technique in C# which makes that possible?

Oh, if you react like "You're an idiot! There's a far better way to keep the shortest paths in Floyd-Warshal!", don't refrain from telling me how :-)

Edit: taking into account the multiple comments, for which I'm very grateful:

In the meantime, I've replaced my strings with Lists of Lists of Points, and this seems to be working fine:

Instead of:

string[,] l_connections;

I have:

List<List<List<Point>>> l_connections;

The speed has doubled, and when working with huge collections of dictionaries (+1000 entries of ...), I get a System.OutOfMemoryException only at ±800 entries instead of ±650.

That's already a huge improvement, but does anybody know to get even better?

Edit: information about garbage collector and its settings:

There is following GC and GCSettings information:

System.GC.MaxGeneration:[2]
IsServerGC:[False]
LargeObjectHeapCompactionMode:[Default]
LatencyMode:[Interactive]

I have altered the LargeObjectHeapCompactionMode to CompactOnce but this brought the performance back to similar results as when I was working with large strings instead of Lists.

Edit: how to work with List collection:
Hereby the code, when working with List collections:

public void floydWarshall(Dictionary<(int x, int y), double> dictionary, out double[,] dist, out List<List<List<Point>>> connections)
{
    int dictionary_size = (int)Math.Ceiling(Math.Sqrt(dictionary.Count));
    dist = new double[dictionary_size, dictionary_size];
...

    for (k = 0; k < dictionary_size; k++)
    {
        // Pick all vertices as source one by one
        for (i = 0; i < dictionary_size; i++)
        {
            // Pick all vertices as destination for the above picked source
            for (j = 0; j < dictionary_size; j++)
            {
                // If vertex k is on the shortest path from i to j, then update dist[i][j]
                if (dist[i, k] + dist[k, j] < dist[i, j])
                {
                    dist[i, j] = dist[i, k] + dist[k, j];
                    connections[i][j] =  new List<Point>();
                    connections[i][j].AddRange(connections[i][k]);
                    connections[i][j].AddRange(connections[k][j]);
                 }
            }
        }
    }

Thanks in advance


Solution

  • Oh, if you react like "You're an idiot! There's a far better way to keep the shortest paths in Floyd-Warshal!", don't refrain from telling me how :-)

    There is indeed.

    Every shortest path from m to n is either a direct step m-n or a path through k, m-...-k-...-n, where m-...-k is the shortest path from m to k, already stored at array index [m,k] and k-...-n is the shortest path from k to n, already stored at array index [k,n].

    So all you need to store at [m,n] is the value of k, which is any single interior vertex along the shortest path. (The k variable in the question code is a valid choice)

    Storage requirement: O(V^2), down from O(V^3).

    A suitable C# data structure would be int[,], where

    • path[m,n] == m means a direct link
    • path[m,n] == n means "I don't know yet" (storing null in int?[,] is also a reasonable approach, but wastes some memory)
    • path[m,n] == k && k != m && k != n means the shortest path passes through k and the path can be reconstructed by recursing both [m,k] and [k,n]

    List<int> GetPathList(int n, int m, List<int> buffer = null)
    {
        if (buffer == null) {
            buffer = new List<int>();
            buffer.Add(n);
        }
        if (n == m) return buffer;
        int k = path[n,m];
        if (k == n) {
            buffer.Add(m);
            return buffer;
        }
        if (k == m) return null;
        if (null == GetBuffer(n,k,buffer)) return null;
        return GetBuffer(k,m,buffer);
    }
    string GetPathString(m,n) => string.Join('-', GetPathList(n,m));