Search code examples
scalajupyter-notebookapache-toreevegas-viz

Jupyter Notebook (Scala, kernel - Apache Toree) with Vegas, Graph not showing data


I'm using Jupyter (kernal - Apache Torre) for Analytics using Apache Spark/Scala. For visualization, I'm trying to use use Vegas (github - https://github.com/vegas-viz/Vegas)

When i use the sample Vegas code - without using the Vegas Spark extension, it works fine (pls see screenshot attached)

However, with DataFrames, it does not seem to be showing the graphs. (i.e. the Graph is not showing data)

Here is the code -

%AddDeps org.vegas-viz vegas_2.11 0.3.11 --transitive

%AddDeps org.vegas-viz vegas-spark_2.11 0.3.11

import vegas._
import vegas.render.WindowRenderer._
import vegas.data.External._
import vegas.sparkExt._

val seq = Seq(("a", 16), ("b", 77), ("c", 45), ("d",101),("e", 132),("f", 166),("g", 51))
val df = seq.toDF("id", "value")

df.show()

+---+-----+
| id|value|
+---+-----+
|  a|   16|
|  b|   77|
|  c|   45|
|  d|  101|
|  e|  132|
|  f|  166|
|  g|   51|
+---+-----+

val usingSparkdf = Vegas("UsingSpark")
  .withDataFrame(df1)
  .encodeX("id")
  .encodeY("value")
  .mark(Bar)

usingSparkdf.show

Vegas-with-DF

Vegas-without-DF

What am i doing wrong here ?

Is this the correct way to include Scala extension ?

 %AddDeps org.vegas-viz vegas-spark_2.11 0.3.11

Solution

  • I was able to fix this issue, encodeX, encodeY should have the (statistical) number type specified i.e. Quant, Nom or Ord, along with Column name.

    The code below works fine.

     val usingSparkdf = Vegas("UsingSpark")
          .withDataFrame(df1)
          .encodeX("id", Nom)
          .encodeY("value", Quant)
          .mark(Bar)
    
    usingSparkdf.show