Search code examples
apache-sparkapache-spark-mllib

API inconsistency of Spark mllib's DistributedMatrix subclasses


In Spark's MLlib, why are the computational interfaces provided for different distributed matrices inconsistent? For example, RowMatrix and IndexRowMatrix provide the computeSVD method, while CoordinateMatrix and BlockMatrix do not.

Why is this?


Solution

  • This is because SVD algorithm needs a row-oriented (or column-oriented) matrix format.

    If CoordinateMatrix and BlockMatrix exposed a computeSVD method, under the hood it would need to trigger a (potentially expensive) conversion.