Search code examples
apache-sparkpysparkapache-spark-mllibmatrix-multiplication

pyspark.mllib DenseMatrix multiplication


I have to do matrix multiplication in PySpark but can't find how to do it with DenseMatrix. For example

from pyspark.mllib.linalg import DenseMatrix

Q = DenseMatrix(nfeatures, nfeatures, [1, 0, 0, 0, 1, 0, 0, 0, 1])
w = DenseMatrix(nfeatures, 1, [0, 0, 0])
print( Q * w )

results in the following error:

TypeError: unsupported operand type(s) for *: 'DenseMatrix' and 'DenseMatrix'

What am I doing wrong? Is there a method for doing matrix multiplication? What is the usual way of doing this with PySpark streaming?

Best regards, Noelia


Solution

  • Neither pyspark.ml.linalg.Matrix nor pyspark.mllib.linalg.Matrix implements matrix multiplication. These classes are used mostly as an exchange formats for mllib / ml algorithms and are not designed to be used as full featured data structures for linear algebra.

    If you need something more than to pass data to some ML / MLlib algorithm just use standard NumPy / SciPy stack.