Search code examples
graphfortransparse-matrixcircuitpardiso

Fast accessing elements of Compressed Sparse Row (CSR) sparse matrix


I want to test some of the newer sparse linear solvers and I want to know if there is a fast way of filling in the matrix. The format I'm interested is CSR (http://goo.gl/hLXYd). Let's say the matrix, in CSR format, is given by:

values(num non-zero elements)
columns(num non-zero elements)
rowIndex(num rows + 1)

The sparse matrix under consideration derives from networks. So, I have thousands of nodes and some of them are connected between them by lines. So, the matrix is structurally symmetric. Each connection (i,j) adds something to the diagonal terms (i,i) and (j,j) and to the off-diagonal (i,j) and (j,i). I could have several connections between the same nodes (i,j,1), (i,j,2)... So, I might need to revisit the 2 diagonal and 2 off-diagonal elements more than once.

I know I can get the beginning of the row by doing rowIndex(i). Then, I would have to run through the elements columns(rowIndex(i):rowIndex(i+1)-1) to find where is j situated.

The question:

Is there a way of accessing the elements faster, while in CSR format, without having to do a search every time I want to update an element?

Some clarifications: I just need to fill in the matrix from scratch. The matrix is structurally symmetric and not really symmetric. The values saved have to do with network data (impedances, resistances etc), they have real values. In general Value(i,j)<>Value(j,i). I have tuples of the form (name1,i1,j1,value1), (name2,i2,j2,value2) etc. These tuples are not sorted, and 2 tuples can refer to the same i,j values, meaning they need to be added

Thanks in advance!


Solution

  • What you have is so called triplet sparse format. Creation of CRS, including removing duplicate entries and summing the values, can be implemented very efficiently. Before programing it yourself, have a look at the SuiteSparse library. It is written in C, but I'm sure you will understand the principle. What interests you is the cholmod_triplet.c file, which implements the functionality you need.

    Essentially, the conversion is performed using two phase bucket sort on your row and column indices. This algorithm has linear complexity, which is important if you are interested in processing large data sets.

    Edit If you want to skip explicit creation of the triplet format all together, you can do that by generating the (row, col) connectivities on the fly and adding them to a dynamic sparse structure. I usually do it using insertion sort and sorted lists, which is in practice the fastest. It is also faster than triplet to CRS conversion, and uses much less memory. The method goes as follows:

    • if you know approximately, how many non-zero entries there are in every row, for every row you pre-allocate an array of (empty) column indices, and a separate array for the values (not linked list, but a simple array) of that size. Something like

      static_lists_cols[row] = malloc(sizeof(int)*expected_number_of_non_zeros) static_lists_vals[row] = malloc(sizeof(double)*expected_number_of_non_zeros)

    • If you do not know that, you choose an initial size and reallocate as needed (using some block size large enough to avoid reallocation overhead) when the row lists are full.

    • for every (row, col) pair you insert the col into the sorted list corresponding to row using insertion sort. For small (up to a few hundred) non-zeros per row linear search is the fastest. For larger number of non-zeros per row you can use bisection to locate the correct place to insert the col index.
    • col is inserted into rowth sorted list by moving the non-zero entries with higher column index in the sorted list. This is cache-friendly, since the rows are in practice small enough to fit into any cache nowadays.
    • After you finish you need to assemble the individual sorted lists into a valid CRS structure by copying the individual row lists into the final columns. The same with values.
    • You could actually avoid the last step by pre-allocating a static 'array of lists' if you are ok that some of the rows can have zero entries. You will hence have a constant number of entries per row, some of which might be zero. Sometimes that is ok.

    This method is faster than using triplet to sparse conversion, at least for FEM models, for which I use it. The general reason is that memory bandwidth is the bottleneck here, and the above scheme uses much less memory:

    • creating the triplet format takes time, and you need to write the triplets to memory
    • conversion to CRS requires reading and writing the triplets at least once to sort them (actually a bit more than once, if you look at the algorithm. You sort twice, and you need auxiliary data structures.)
    • depending on the connectivity structure, you may end up having a large number of (row, col) duplicates in the triplet format, which are removed during the assembly by adding the corresponding values. This overhead does not exist in the method above - if the col already exists in the row list, you simply update the corresponding value.
    • updating the sorted lists can be done in parallel if you assign row ranges to individual workers. No communication, nor synchronization is needed. Assuring load balancing is another story...

    Have a look at a performance comparison of using those two methods (Figure 1) for triangular elements in 2D. Note that the performance difference depends on the ratio of the number of entries in the triplet to assembled sparse matrix format (Table 2). But in general, the method is never worse than triplet to crs conversion, and triplets need to be created in the first place. You can also download a MATLAB MEX function sparse_create, which is a part of mutils package (see the downloads section).