Search code examples
matlabfile-iotext-filessparse-matrix

Matlab from text file to sparse matrix.


I have a huge text file in the following format:

1   2
1   3
1   10
1   11
1   20
1   376
1   665255
2   4
2   126
2   134
2   242
2   247

First column is the x coordinate while second column is the y coordinate. It indicates that if I had to construct a Matrix

M = zeros(N, N);
M(1, 2) = 1;
M(1, 3) = 1;
.
.
M(2, 247) = 1;

This text file is huge and can't be brought to main memory at once. I must read it line by line. And save it in a sparse matrix.

So I need the following function:

function mat = generate( path )
    fid = fopen(path);
    tline = fgetl(fid);
    % initialize an empty sparse matrix. (I know I assigned Mat(1, 1) = 1)
    mat = sparse(1);
    while ischar(tline)
        tline = fgetl(fid);
        if ischar(tline)
            C = strsplit(tline);
        end
        mat(C{1}, C{2}) = 1;
    end
    fclose(fid);
end

But unfortunately besides the first row it just puts trash in my sparse mat. Demo:

1 7
1 9
2 4
2 9

If I print the sparse mat I get:

   (1,1)        1
  (50,52)       1
  (49,57)       1
  (50,57)       1

Any suggestions ?


Solution

  • Fixing what you have...

    Your problem is that C is a cell array of characters, not numbers. You need to convert the strings you read from the file into integer values. Instead of strsplit you can use functions like str2num and str2double. Since tline is a space-delimited character array of integers in this case, str2num is the easiest to use to compute C:

    C = str2num(tline);
    

    Then you just index C like an array instead of a cell array:

    mat(C(1), C(2)) = 1;
    

    Extra tidbit: If you were wondering how your demo code still worked even though C contained characters, it's because MATLAB has a tendency to automatically convert variables to the correct type for certain operations. In this case, the characters were converted to their double ASCII code equivalents: '1' became 49, '2' became 50, etc. Then it used these as indices into mat.


    A simpler alternative...

    You don't even have to bother with all that mess above, since you can replace your entire function with a much simpler approach using dlmread and sparse like so:

    data = dlmread(filePath);
    mat = sparse(data(:, 1), data(:, 2), 1);
    clear data;  % Save yourself some memory if you don't need it any more