Search code examples
perlrsparse-matrix

How can I pass a sparse matix from Perl to R?


I have a very sparse Perl matrix (array of arrays), where undef values are equivalent to zeros. It has 10-1000 rows and up to 100k columns. It looks like:

$ARRAY1 = [
            [
              ( undef ) x 1069,
              1,
              ( undef ) x 47,
              1,
              ( undef ) x 11,
              2,
              ( undef ) x 50,
              1,
              ( undef ) x 23,
              1,
              ( undef ) x 6033,
              ...
            ],
            [
              ...
            ],
            ...
          ]

... as I said - very sparse.

I want to use this matrix in an R script (see previous post). One way is to print the table to a file from Perl - one line per row, and print 0 anytime we run into undef.

But perhaps there is a better, more compact way to pass this sparse matrix?


Solution

  • That's a moderately hard problem.

    I think I would start by making simple things possible to rephrase a Perl slogan:

    1. Learn about the two or three sparse matrix package for R like slam, SparseM, ....
    2. Pick one you prefer and learn how to efficiently build a sparse matrix, presumably from triplets such as (x,y,value) to encode value at position (x,y).
    3. Write Perl code to emit your sparse matrix in that form to a tempfile.
    4. Read that tempfile in R and build your sparse matrix.

    Fancier and faster cross-language serialization can come later. It's not a trivial problem.