Calculate the sum of each column of a sparse matrix using C

I have a matrix of size (8 x 8) as shown below, I have to convert it into a form of sparse matrix and compute the sum of non-zero elements for each column using C language.

 Matrix =   [1 2 0 4 0 0 0 0;
             0 1 3 0 2 0 0 0;
             1 0 4 7 0 0 0 0;
             0 6 0 0 1 8 0 0;
             0 0 0 0 4 0 0 0;
             0 0 0 0 8 1 1 0;
             0 0 0 0 9 0 2 0;
             0 0 0 0 0 0 7 1]

Could someone please advise on how to proceed on this problem? Please note that this is a sample matrix and in practice I have huge matrices of dimensions around (15,000 rows x 25,000 columns)

Solution

Could someone please advise on how to proceed on this problem?

Find out all required and optional operations on the sparse matrices.

Many operations require either row-wise or column-wise data access. Some operations (like efficient naïve matrix-matrix multiplication) require both; the particular one depending on which side of the multiplication operator the matrix is on. Some operations benefit from simple transposing. Collecting the operations, and sorting them by their requirements, gives you criteria you can use to pick the "best" (closest to optimal for your use cases) implementation.
Choose the proper type to describe the values in the matrix.

In particular, I recommend considering float, double, unsigned char, signed char, uintN_t, and intN_t types (for N = 8, 16, 32, or 64).

You need the precision and range to describe the values you have, without wasting memory for precision or range you do not need.
Choose which data structure(s) you will use to implement the sparse matrix.

The Wikipedia article on sparse matrix describes a few typical structures (DOK, LIL, COO, CSR, CSC, Yale). If you implement more than one, you'll need to write not only all low-level operations on each, but all combinations for arithmetic operations. (For example, if your sparse matrices use either compressed sparse row (CSR) or compressed sparse column (CSC) formats, then you'll need four matrix-matrix multiplication routines depending on the types of the two argument matrices (CSR×CSR, CSC×CSR, CSR×CSC, CSC×CSC), because matrix multiplication is not commutative.)

If performance is important, cache effects of the various data structures should be considered carefully. You'll want to examine memory in sequential order, in one or more arrays, to take advantage of CPU prefetching and caching. If there is lots of data, you'll want to "pack" everything as tight as possible (because memory bandwidth tends to be the limiting bottleneck with matrix operations), but keep the operations needed to handle each matrix element to a minimum.
Write basic matrix input/output routines, and unit tests to check that your data structures work correctly.

Typically, you'll need to read and write these matrices from/to files or streams (like standard input and output). Write those first. Then, write some debug functions that describe how the matrix data is stored. At minimum, you'll want to write a test program that reads a matrix (from standard input), prints it (to standard output) and the storage format (to standard error); you can supply it with a few test case inputs (at least one with random data) to see if the roundtrip retains the correct data. I often use awk to both generate the test data sets, and to compare numerical output to expected numerical output.
Write a function that given sparse matrix M, returns a row vector v, where v_i = ∑ M_j,i

In other words, element i in v is the sum of all elements of M in column i.
Write a test program for your function, and test it.

Typically, you'll want your program to read a matrix (from standard input), and write the vector to standard output.