Search code examples
scalaapache-sparkapache-spark-sqlpearson-correlation

Seeing all columns of a correlation matrix in Spark using scala


I am trying to train a k-means model and currently in phase of checking correlation within my feature vectors.

When I run a pearson correlation against my feature vector I am unable to see results for all of my features.

The code I am running is:

val cor = Correlation.corr(scoringDf, "features")
cor.show(false)

The correlation runs fine but when i try to see the results using show method (as Correlation.corr returns a Datafame object) the results are displayed as

|1.0                  0.18047211468479446  0.08002566273874058   ... (5 total)
0.18047211468479446  1.0                  0.02926796076983553   ...
0.08002566273874058  0.02926796076983553  1.0                   ...
0.30256416877032244  0.15974389490583188  0.054692657400425136  ...
0.3408783412055776   0.13008391583866225  0.04241296238931376   ...|

Is there a way to see the hidden columns?

I have also tried the following code but results are same.

val Row(coeff1: Matrix) = Correlation.corr(scoringDf, "features").head
println(s"Pearson correlation matrix:\n $coeff1")

Edit:

here is the schema for cor dataframe

root
 |-- pearson(features): matrix (nullable = false)

Solution

  • Finally I am able to get the output the way I want. Changed my code to look like this

    val Row(coeff1: Matrix) = Correlation.corr(scoringDf, "features").head
    println(s"Pearson correlation matrix:\n " + coeff1.toString(10, 100000))
    

    The output is displayed as shown below:

    Pearson correlation matrix:
     1.0                  0.1804721146847944   0.08002566273874055   0.3025641687703226   0.34087834120557725   
    0.1804721146847944   1.0                  0.02926796076983553   0.15974389490583193  0.13008391583866233   
    0.08002566273874055  0.02926796076983553  1.0                   0.05469265740042514  0.042412962389313726  
    0.3025641687703226   0.15974389490583193  0.05469265740042514   1.0                  0.241118490251708     
    0.34087834120557725  0.13008391583866233  0.042412962389313726  0.241118490251708    1.0