Search code examples
javaapache-sparkmatrixapache-spark-mllibpearson-correlation

How to print all the columns of the pearson Matrix in Java


I am dealing with a problem (same as here How to print all the columns of a Matrix) but in Java(and Spark MLLib).

So, a pearson matrix is created with the code below. I want to print all 13 columns/fields of the Pearson matrix but it gets really confusing.

The code is:

Row r1=Correlation.corr(output,"intensity").head();
System.out.println("Pearson correlation matrix:\n" + r1.get(0).toString());

and the r1 schema is:

root
|-- pearson(intensity): matrix (nullable = false)

What I get as a result is:

Pearson correlation matrix:

1.0                  -1.000000000000013  -0.9999999999999991  ... (13 total)
-1.000000000000013   1.0                 1.0000000000000069   ...
-0.9999999999999991  1.0000000000000069  1.0                  ...
-0.9999999999999983  0.999999999999994   1.0000000000000009   ...
-0.9999999999999983  0.999999999999994   1.0000000000000009   ...
-0.9999999999999983  0.999999999999994   1.0000000000000009   ...
-1.0000000000001108  1.0000000000000644  1.000000000000129    ...
-0.9999999999999983  0.999999999999994   1.0000000000000009   ...
1.0                  -1.000000000000005  -0.9999999999999989  ...
-1.0000000000000029  1.0000000000000056  1.0000000000000009   ...
-1.0000000000000036  1.000000000000003   1.0000000000000018   ...
-0.9999999999999999  1.000000000000003   1.0000000000000002   ...
0.9999999999999989   -1.000000000000012  -0.9999999999999987  ...

Basically, I think that there is a Matrix(maybe DenseMatrix) in the row r1. How can I have access to this matrix and how can I print all 13 columns?

With the "head()" (or show(false) if I assign r1 to a Dataset) only the first 3 columns and the "13 total" appear.

I am kind of new in Spark-Java. Please help! Thank you very very much in advance!!!


Solution

  • After many many efforts..I realized it was much much simpler..! Just posting the answer...there is a matrix in the first row of the dataset. All you have to do is access the first row and then the Matrix.

    Row r1 = Correlation.corr(output, "intensity").head();
    Matrix r2= r1.getAs(0);
       for (int i = 0; i < r2.numRows(); i++) {
           for (int j = 0; j < r2.numCols(); j++) {
                System.out.print(r2.apply(i,j)+"  ");
           }
           System.out.println();
       }