Search code examples
csparse-matrix

Standard format for writing Compressed Row Storage into text files?


I have a large sparse matrix stored in Compressed Row Storage (CRS) format. This is basically three arrays: an array containing the Values, an array for Column Index, and a final array containing the Row Pointers. E.g. http://web.eecs.utk.edu/~dongarra/etemplates/node373.html

I want to write this information into a text (.txt) file, which is intended to be read and put into three arrays using C. I currently plan to do this by writing all the entries in the Value array in one long line separated by commas. E.g. 5.6,10,456,78.2,... etc. Then do the same for the other two arrays.

My C code will end read the first line, put all the values into an array labeled "Value". And so on.

Question

Is this "correct"? Or is there a standard way of putting CRS data into text files?


Solution

  • No standard format that I'm aware of. You decide on a format that makes your life easy.

    First, consider that if you want to look at one of these text files, you'll be instantly put off by the long lines. Some text editors might simply hate you. There's nothing wrong with splitting lines up.

    Second, consider writing out the number of elements in each array (well, I suppose there's only two different array lengths for the three arrays) at the beginning of the file. This will let you preallocate your arrays. If you have all array lengths at hand, you have the option of doing a single memory allocation.

    Finally, consider writing out some sensible tag names. Some kind of header that can identify your file is the correct format, then something to denote the start of each array. It's kind of a sanity thing for your code to detect problems with the file. It might just be one character, but it's something.

    Now... call me a grungy old programmer, but I'd probably just write whole lot in binary. Especially if it's floating point data, I wouldn't want to deal with the loss of precision you get when you write out numbers as text (or the space they can consume when you write them with full precision). Binary files are easy to write and quick to run. You just have to be careful if you're going to be using them across platforms with different endian order.

    That's my 2 cents worth.. Hope it's useful to you.