Search code examples
javascalasparse-matrixcompression

What library can I use for calc large sparse matrix in Java or Scala?


When I use large sparse matrix, it's better to use compressed matrix like CCS, CRS and so on.

I tried to use ScalaNLP, la4j, colc to calc 100,000*100,000 sparse matrix. There are some problems.

  1. Breeze (ScalaNLP/Scalala)

    • it gives me the CSCMatrix type which can have 100,000*100,000 size.
    • but the problem is it's under development.
    • so we cannot calc element-wise product of CSCMatrix with CSCMatrix, like csc1 :* csc2.
    • and also you cannot add CSCMatrixes to each other.
  2. la4j

    • It has CCSMatrix and CRSMatrix.
    • but when creating (new CCSMatrixFactory).createMatrix(100000, 100000), it occur OutOfMemoryError.
    • The matrix should be zeros, so it should not use large memory spaces.
  3. colc

    • It has SparseDoubleMatrix2D.
    • but when creating matrix like new SparseDoubleMatrix2d(100000, 100000), it says IllegalArgumentException: matrix too large.

To calc large sparse matrix, what library can I use? could you show me the example?


Solution

  • I was curious with Breeze, so I looked into the source. It's a bit messy because the operators are all emitted from some println style code generation (!)... But I came up with this:

    import breeze.linalg.operators.{BinaryOp, OpMulScalar}
    
    object CSCMatrixExtraOps {
      abstract class CSCMatrixCanMulM_M[@specialized (Int, Float, Long, Double) A]
        extends BinaryOp[CSCMatrix[A], CSCMatrix[A], OpMulScalar, CSCMatrix[A]] {
    
        protected def times(a: A, b: A): A
    
        protected def zeros  (rows: Int, cols: Int): CSCMatrix[A]
        protected def builder(rows: Int, cols: Int, sz: Int): CSCMatrix.Builder[A]
    
        final def apply(a: CSCMatrix[A], b: CSCMatrix[A]): CSCMatrix[A] = {
          val rows  = a.rows
          val cols  = a.cols
          require(rows == b.rows, "Matrices must have same number of rows!")
          require(cols == b.cols, "Matrices must have same number of cols!")
    
          if (cols == 0) return zeros(rows, cols)
    

     

          val res     = builder(rows, cols, math.min(a.activeSize, b.activeSize))
          var ci      = 0
          var acpStop = a.colPtrs(0)
          var bcpStop = b.colPtrs(0)
          while (ci < cols) {
            val ci1 = ci + 1
            var acp = acpStop
            var bcp = bcpStop
            acpStop = a.colPtrs(ci1)
            bcpStop = b.colPtrs(ci1)
            while (acp < acpStop && bcp < bcpStop) {
              val ari = a.rowIndices(acp)
              val bri = b.rowIndices(bcp)
              if (ari == bri) {
                val v = times(a.data(acp), b.data(bcp))
                res.add(ari, ci, v)
                acp += 1
                bcp += 1
              } else if (ari < bri) {
                acp += 1
              } else /* ari > bri */ {
                bcp += 1
              }
            }
            ci = ci1
          }
    
          res.result()
        }
      }
    

     

      implicit object CSCMatrixCanMulM_M_Int extends CSCMatrixCanMulM_M[Int] {
        protected def times(a: Int, b: Int) = a * b
        protected def zeros(rows: Int, cols: Int) = CSCMatrix.zeros(rows, cols)
        protected def builder(rows: Int, cols: Int, sz: Int) = 
          new CSCMatrix.Builder(rows, cols, sz)
      }
    
      implicit object CSCMatrixCanMulM_M_Double extends CSCMatrixCanMulM_M[Double] {
        protected def times(a: Double, b: Double) = a * b
        protected def zeros(rows: Int, cols: Int) = CSCMatrix.zeros(rows, cols)
        protected def builder(rows: Int, cols: Int, sz: Int) = 
          new CSCMatrix.Builder(rows, cols, sz)
      }
    }
    

    Example:

    import breeze.linalg._
    import CSCMatrixExtraOps._
    
    val m1 = CSCMatrix((0, 0, 0), (0, 5, 0), (0, 0, 10), (0, 13, 0))
    val m2 = CSCMatrix((0, 0, 0), (0, 5, 0), (0, 0, 10), (13, 0, 0))
    (m1 :* m2).toDenseMatrix
    

    Result:

    0  0   0    
    0  25  0    
    0  0   100  
    0  0   0