Search code examples
csparse-matrixnumerical-methods

Read a file (with a sparse matrix) and generate CSR


I need to read a square matrix that is in a .txt file and store it in the CSR format. I know how to read the matrix, but I haven't been able to come with an idea on how to make the CSR.

This is the code to read the matrix:

#include<stdio.h>
#include<string.h>
int main(){
  FILE *f;
  int i,j,n;
  float x;
  f=fopen("file.txt","r");
  if(f=NULL){
  printf(\n No file found \n");
  }
  fscanf(f,"%d",&n);
  for(i=0;i<n;i++){
     for(j=0;j<n;j++){
        fscan(f,"%f",&x);
        printf("\n element (%d,%d) = %f",i,j,x); //this is just to see that the matrix looks like I want it to be, it's not part of CSR problem
     }
  }
  return 0;
}

Any suggestions?


Solution

  • Let's interpret the CSR (or CRS, for "compressed row storage") structure as a structure of three arrays:

    struct csr {
        size_t row_count;
        size_t *row_indices;
        float *values;
        size_t *column_indices;
    };
    

    Here, values and column_indices should point to arrays of the same size (alternatively, you could structure them as one array of int-double pairs), and row_indices should point to an array of indices into these two arrays. (Actually, we're going to take some liberties with row_indices; rather than pointing at the first column/value of a row, we're going to point one past the last column/value of the row.)

    Your *.txt file format seems to contain a square matrix, and starts with a size parameter (n).

    struct csr your_csr;
    size_t n;
    
    fscanf(f, "%d", &n);
    your_csr.row_count = n;
    

    After reading this n, we can allocate row_indices. Unfortunately, we don't know how many nonzero values there will be in the matrix. For now, this simple implementation will just allocate n x n elements, and a more conservative approach is left as an exercise to the reader.

    your_csr.row_indices     = calloc(n, sizeof(your_csr.row_indices[0]));
    your_csr.values          = calloc(n * n, sizeof(your_csr.values[0]));
    your_csr.column_indices  = calloc(n * n, sizeof(your_csr.column_indices[0]));
    

    Now that we have our memory in place, let's deal with the matrix data.

    size_t pair_index = 0;
    
    for (size_t row_index = 0; row_index < n; row_index++) {
        for (size_t column_index = 0; column_index < n; column_index++) {
            float value;
            fscanf(f, "%f", &value);
    

    For every non-zero value you read, you're going to write what you know into your values and column_indices array

            if (value != 0.0) {
                your_csr.values[pair_index] = value;
                your_csr.column_indices[pair_index] = column_index;
                pair_index++;
            }
        }
    

    After you've read a row, you're going to write down where the row ends.

        your_csr.row_indices[row_index] = pair_index;
    }
    

    Now, your_csr contains all the data you need to know about the matrix:

    • your_csr.row_count contains the number of rows in the matrix.
    • The length of the other two arrays is your_csr.row_indices[your_csr.row_count - 1].
    • If you want to know in which column your_csr.values[x] belongs, look at your_csr.column_indices[x].
    • The (non-zero) values of the first row (let's call it row 0; mathematicians would disagree, but zero-based indices are great for programming) can be found in your_csr.values[x], where 0 <= x && x < your_csr.row_indices[0].
    • The values of any other row r can be found your_csr.values[x], where your_csr.values[r - 1] <= x && x < your_csr.row_indices[r].