Search code examples
apache-sparkapache-zeppelin

Apache Zeppelin not returning aggregate data


I am running Apache Spark 2.0.1 and Apache Zeppelin 0.6.2.

In Zeppelin, I have the following paragraph:

val df = sqlContext
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map( "table" -> "iot_data2", "keyspace" -> "iot" ))
  .load()

import org.apache.spark.sql.functions.{avg,round}

val ts = $"updated_time".cast("long")

val interval = (round(ts / 3600L) * 3600.0).cast("timestamp").alias("time")

df.groupBy($"a", $"b", $"date_bucket", interval).avg("t").createOrReplaceTempView("iot_avg")

The next paragraph I am trying to plot a graph, but the value for avg("t") is always 0:

%sql
select time,avg("t") as avg_t from ble_temp_avg where a = '${a}' and b = '${b}' group by time order by time

I think I am missing something really obvious but I just don't know what it is as a new Spark and Zeppelin user.


Solution

  • This seems to work after I rewrite the paragraphs:

    In the first paragraph:

    val df = sqlContext
      .read
      .format("org.apache.spark.sql.cassandra")
      .options(Map( "table" -> "iot_data2", "keyspace" -> "iot" ))
      .load()
    
    import org.apache.spark.sql.functions.{avg,round}
    
    val ts = $"updated_time".cast("long")
    
    val interval = (round(ts / 3600L) * 3600.0).cast("timestamp").alias("time")
    
    df.select($"a", $"b", $"date_bucket", interval, $"t").createOrReplaceTempView("iot_avg")
    

    In the second paragraph:

    %sql
    select time,avg(t) as avg_t from iot_avg where a = 'test1' and b = 'test2' group by time order by time