Search code examples
javaapache-sparksparse-matrixapache-spark-mllib

How to combine or merge two sparse vectors in Spark using Java?


I used the Java's API, i.e. Apache-Spark 1.2.0, and created two parse vectors as follows.

Vector v1 = Vectors.sparse(3, new int[]{0, 2}, new double[]{1.0, 3.0});
Vector v2 = Vectors.sparse(2, new int[]{0, 1}, new double[]{4,5});

How can I get a new vector v3 that is formed by combining v1 and v2, so the result should be: (5, [0,2,3,4],[1.0, 3.0, 4.0, 5.0])


Solution

  • I found the problem has been one year and is still pending. Here, I solved the problem by writing a helper function myself, as follows.

    public static SparseVector combineSparseVectors(SparseVector... svs) {
        int size = 0;
        int nonzeros = 0;
        for (SparseVector sv : svs) {
            size += sv.size();
            nonzeros += sv.indices().length;
        }
    
        if (nonzeros != 0) {
            int[] indices = new int[nonzeros];
            double[] values = new double[nonzeros];
    
            int pointer_D = 0;
            int totalPt_D = 0;
            int pointer_V = 0;
            for (SparseVector sv : svs) {
                int[] indicesSV = sv.indices();
                for (int i : indicesSV) {
                    indices[pointer_D++] = i + totalPt_D;
                }
                totalPt_D += sv.size();
    
                double[] valuesSV = sv.values();
                for (double d : valuesSV) {
                    values[pointer_V++] = d;
                }
    
            }
            return new SparseVector(size, indices, values);
        } else {
            System.out.println("all zeroes");
            return null;
        }
    
    }