Search code examples
scalaapache-sparkmatrixrddapache-spark-mllib

How to convert RowMatrix to local Matrix?


I have a problem regarding matrices in Spark.

Suppose I have a RowMatrix named X like this:

0.5    0.5  
0.25   0.0625
0.125  0.125
0.0625 0.0625
0.0625 0.25

Now what I want to do is to multiply this RowMatrix with the transposed version of the RowMatrix X.

0.5 0.25   0.125 0.0625 0.0625
0.5 0.0625 0.125 0.0625 0.25

Now, for all I know I can't multiply a RowMatrix with another RowMatrix, it have to be a RowMatrix and a local matrix. Hence, I tried to convert the RowMatrix to a local dense matrix using this code:

val arr = X.rows.map(x=>x.toArray).collect.flatten
val Xlocal = Matrices.dense(X.numRows.toInt,X.numCols.toInt,arr)

But it doesn't convert it properly because RowMatrix was row-based I think? I'm not really sure and the local dense matrix was stored in column-major order, so the order is messed up.

Can someone help me how to implement this?


Solution

  • A RowMatrix do not have any row indices and should only be used when the row order does not matter. If the order does matter use an IndexedRowMatrix instead.

    It is possible to convert a RowMatrix to an IndexedRowMatrix but note that the order is not guaranteed and it's preferable to use IndexedRowMatrix directly. Assuming rowMat is the matrix to convert:

    val indRowMax = new IndexedRowMatrix(rowMat.rows.zipWithIndex().map{ case (v, id) => IndexedRow(id, v)})
    

    An IndexedRowMatrix can easily be converted to a local matrix:

    val localMat = indRowMax.toBlockMatrix().toLocalMatrix()
    

    and to multiply with the transpose can be done as follows:

    indRowMax.multiply(localMat.transpose)