This sounds like a simple question, but I am not able to figure out how to display the contents of a pyspark BlockMatrix to the console. What methods should I call on it to actually see my result?
You can call toLocalMatrix()
. I used an example matrix from the Spark MLLib Python API docs to illustrate:
mat = mat1.toLocalMatrix()
# which returns a DenseMatrix
# DenseMatrix(6, 2, [1.0, 2.0, 3.0, 7.0, 8.0, 9.0, 4.0, 5.0, 6.0, 10.0, 11.0, 12.0], 0)
# and could be further converted to a numpy array using `.toArray()`:
np_mat = mat.toArray()
# array([[ 1., 4.],
# [ 2., 5.],
# [ 3., 6.],
# [ 7., 10.],
# [ 8., 11.],
# [ 9., 12.]])