Search code examples
c#arraysalgorithmsparse-matrix

faster algorithm or technique for building sparse arrays in c#


I have a matrix-building problem. To build the matrix (for a 3rd party package), I need to do it row-by-row by passing a double[] array to the 3rd-party object. Here's my problem: I have a list of objects that represent paths on a graph. Each object is a path with a 'source' property (string) and a 'destination' property (also string). I need to build a 1-dimensional array where all the elements are 0 except where the source property is equal to a given name. The given name will occur multiple times in the path list. Here's my function for building the sparse array:

    static double[] GetNodeSrcRow3(string nodeName)
    {
        double[] r = new double[cpaths.Count ];
        for (int i = 1; i < cpaths.Count; i++)
        {
            if (cpaths[i].src == nodeName) r[i] = 1;
        }
        return r;
    }

Now I need to call this function about 200k times with different names. The function itself takes between 0.05 and 0.1 seconds (timed with Stopwatch). As you can imagine, if we take the best possible case of 0.05 seconds * 200k calls = 10,000 seconds = 2.7 hours which is too long. The object 'cpaths' contains about 200k objects.

Can someone think of a way to accomplish this in a faster way?


Solution

  • I can't see the rest of your code, but I suspect most of the time is spent allocating and garbage collecting all the arrays. Assuming the size of cpaths doesn't change, you can reuse the same array.

    private static double[] NodeSourceRow == null;
    private static List<int> LastSetIndices = new List<int>();
    
    static double[] GetNodeSrcRow3(string nodeName) {
        // create new array *only* on the first call
        NodeSourceRow = NodeSourceRow ?? new double[cpaths.Count];
    
        // reset all elements to 0
        foreach(int i in LastSetIndices) NodeSourceRow[i] = 0;
        LastSetIndices.Clear();
    
        // set the 1s
        for (int i = 1; i < cpaths.Count; i++) {
            if (cpaths[i].src == nodeName) {
                NodeSourceRow[i] = 1;
                LastSetIndices.Add(i);
            }
        }
    
        // tada!!
        return NodeSourceRow;
    }
    

    One drawback potential drawback would be if you need all the arrays to used at the same time, they will always have identical contents. But if you only use one at a time, this should be much faster.