I'm implementing the Floyd-Warshal algorithm, as can be found here.
I'm not just interested in the shortest distance between nodes, but also in the path, corresponding with that distance.
In order to do this, I have modified the algorithm as follows:
double[,] dist = new double[V, V]; // existing line
string[,] connections = new string[V, V]; // new line, needed for remembering the path
...
for (i = 0; i < V; i++){
for (j = 0; j < V; j++){
dist[i, j] = graph[i, j];
connections[i, j] = $"({i},{j})";}} // added: initialisation of "connections"
...
if (dist[i, k] + dist[k, j] < dist[i, j])
{
dist[i, j] = dist[i, k] + dist[k, j];
connections[i, j] = connections[i, k] + "-" + connections[k, j]; // Added for remembering shortest path
}
I'm running this algorithm with a snake-like list of locations of one million, all of them simply being added one after the other.
As a result, my connections
array looks as follows:
[0, 0] "(0,0)"
[0, 1] "(0,1)"
[0, 2] "(0,1)-(1,2)"
[0, 3] "(0,1)-(1,2)-(2,3)"
[0, 4] "(0,1)-(1,2)-(2,3)-(3,4)"
[0, 5] "(0,1)-(1,2)-(2,3)-(3,4)-(4,5)"
...
[0, 787] "(0,1)-(1,2)-...(786,787)" // length of +7000 characters
...
... at the moment of my OutOfMemoryException
(what a surprise) :-)
I would like to avoid that OutOfMemoryException
, and start thinking of different techniques:
I believe the second option being the most realistic (don't kill me if I'm wrong) :-) Is there a technique in C# which makes that possible?
Oh, if you react like "You're an idiot! There's a far better way to keep the shortest paths in Floyd-Warshal!", don't refrain from telling me how :-)
Edit: taking into account the multiple comments, for which I'm very grateful:
In the meantime, I've replaced my strings with List
s of List
s of Point
s, and this seems to be working fine:
Instead of:
string[,] l_connections;
I have:
List<List<List<Point>>> l_connections;
The speed has doubled, and when working with huge collections of dictionaries (+1000 entries of ...), I get a System.OutOfMemoryException
only at ±800 entries instead of ±650.
That's already a huge improvement, but does anybody know to get even better?
Edit: information about garbage collector and its settings:
There is following GC and GCSettings information:
System.GC.MaxGeneration:[2]
IsServerGC:[False]
LargeObjectHeapCompactionMode:[Default]
LatencyMode:[Interactive]
I have altered the LargeObjectHeapCompactionMode
to CompactOnce
but this brought the performance back to similar results as when I was working with large strings instead of List
s.
Edit: how to work with List collection:
Hereby the code, when working with List
collections:
public void floydWarshall(Dictionary<(int x, int y), double> dictionary, out double[,] dist, out List<List<List<Point>>> connections)
{
int dictionary_size = (int)Math.Ceiling(Math.Sqrt(dictionary.Count));
dist = new double[dictionary_size, dictionary_size];
...
for (k = 0; k < dictionary_size; k++)
{
// Pick all vertices as source one by one
for (i = 0; i < dictionary_size; i++)
{
// Pick all vertices as destination for the above picked source
for (j = 0; j < dictionary_size; j++)
{
// If vertex k is on the shortest path from i to j, then update dist[i][j]
if (dist[i, k] + dist[k, j] < dist[i, j])
{
dist[i, j] = dist[i, k] + dist[k, j];
connections[i][j] = new List<Point>();
connections[i][j].AddRange(connections[i][k]);
connections[i][j].AddRange(connections[k][j]);
}
}
}
}
Thanks in advance
Oh, if you react like "You're an idiot! There's a far better way to keep the shortest paths in Floyd-Warshal!", don't refrain from telling me how :-)
There is indeed.
Every shortest path from m
to n
is either a direct step m-n
or a path through k
, m-...-k-...-n
, where m-...-k
is the shortest path from m
to k
, already stored at array index [m,k]
and k-...-n
is the shortest path from k
to n
, already stored at array index [k,n]
.
So all you need to store at [m,n]
is the value of k
, which is any single interior vertex along the shortest path. (The k
variable in the question code is a valid choice)
Storage requirement: O(V^2)
, down from O(V^3)
.
A suitable C# data structure would be int[,]
, where
path[m,n] == m
means a direct linkpath[m,n] == n
means "I don't know yet" (storing null
in int?[,]
is also a reasonable approach, but wastes some memory)path[m,n] == k && k != m && k != n
means the shortest path passes through k
and the path can be reconstructed by recursing both [m,k]
and [k,n]
List<int> GetPathList(int n, int m, List<int> buffer = null)
{
if (buffer == null) {
buffer = new List<int>();
buffer.Add(n);
}
if (n == m) return buffer;
int k = path[n,m];
if (k == n) {
buffer.Add(m);
return buffer;
}
if (k == m) return null;
if (null == GetBuffer(n,k,buffer)) return null;
return GetBuffer(k,m,buffer);
}
string GetPathString(m,n) => string.Join('-', GetPathList(n,m));