I am trying to build a BlockMatrix,
+---+---+---+---+
|7.0|6.0|3.0|0.0|
|3.0|2.0|5.0|1.0|
|9.0|4.0|0.0|3.0|
+---+---+---+---+
from the three sub-matrices.
+---+---+
|7.0|6.0|
|3.0|2.0|
+---+---+
+---+---+
|9.0|4.0|
+---+---+
+---+---+
|3.0|0.0|
|5.0|1.0|
|0.0|3.0|
+---+---+
Here is my code.
from pyspark.mllib.linalg import Matrices
from pyspark.mllib.linalg.distributed import BlockMatrix
blocks = sc.parallelize([(0, 0, Matrices.dense(2, 2, [7,3,6,2])),
(2, 0, Matrices.dense(1, 2, [9,4])),
(0, 2, Matrices.dense(3, 2, [3.0, 5.0, 0.0, 0.0, 1.0, 3.0]))
])
blockM = BlockMatrix(blocks, 2, 2)
However I got error "TypeError: Cannot convert type into a sub-matrix block tuple". Any idea where am I wrong? How to understand this blockMatrix type? Thanks!
TL;DR You can create BlockMatrix
from such input directly.
BlockMatrix
is a regular structure - all blocks in a BlockMatrix
have to be of the same maximum size. Furthermore total number of rows and columns have to be divisible by the number of rows and columns in a block respectively.
However individual matrices can be smaller than the block - in such case data will occupy the upper right corner of the block.
You'll have to restructure your data to match these criteria.