Search code examples
matlabsparse-matrixmatlab-table

Sparse table in MATLAB, is it possible?


\ am dealing with a matrix in MATLAB which is sparse and has many rows and columns. In this case, the row and columns of the matrix are the ids for particular items. Let's assume them as id1 and id2.

It would be nice if the ids for rows and columns could be embedded so I can have access to them easily to them without the need for creating extra variables that keep the two ids.

The answer would be probably to use a table data type. Tables are very ideal answer for my need however I was wondering if I could create a table data type for a sparse matrix?

A  [m*n] sparse matrix    %% m & n are huge 
id1 [1*m] , id2 [1*n]     %% two vectors containing numeric ids for rows and column

Could we obtain?

T  [m*n] sparse table matrix

Thanks for sharing your view with me.


Solution

  • I will address the question and the comments in order to clear some confusion.

    The short answer

    There is no sparse table class in Matlab. Cannot do. Use sparse() matrices.

    The long answer

    There is a reason why sparse tables make little sense:

    1. Philosophically speaking, the advantage of having nice row and column labels, is completely lost if you are working with a big panel of data and/or if the data is sparse.

      Scrolling through 246829 rows and 33336 columns? Can only be useful at very isolated times if you are debugging your code and a specific outlier is causing you results to go off. Also, all you might see is just a sea of zeros.

    2. Technically a table can have more columns for the same variable, i.e. table(rand(10,2), rand(10,1)) is a valid table. How would you consider define sparsity on such table?

      Fine, suppose you are working with a matrix-like table, i.e. one element per table cell and same numeric class. Still, none of the algebraic operators are defined on a table(). So you need to extract the content first, in order to be able to perform any operation that spans more than a single column of data. Just to be clear, once the data is extracted, then you have e.g. your double (full) matrix or in an ideal case a double sparse matrix.

    Now, a few misconceptions to clear:

    • Less variables implies clearer/cleaner code. Not true. You are probably thinking about the extreme case (in bad practices) of how do I make a series of variables a1, a2, a3, etc..

      There is a sweet spot between verbosity and number of variables, amount of comments, and code clarity/maintainability. Only with time and experience you find the right balance.

    • Control over data cannot go without visual inspection. This approach does NOT scale with big data and the sooner you abandon it, the faster your code will become more reliable. You need to verify your results systematically, rather than relying on visual inspection. Failure to (visually) spot a problem in the data, grows exponentially with its dimension, faster than with systematic tests.

    Some background info on my work:

    I work with high-frequency prices, that's terabytes of data. I also extended the table() class with additional methods and fixes to help me with my work (see https://github.com/okomarov/tableutils), but I do not see how sparsity is a useful feature to add to table().