Search code examples
apache-sparkapache-spark-dataset

Spark dataset alias column on-the-fly like for a dataframe


May be a really silly question, but for:

val ds3 = ds.groupBy($"ip")
            .avg("humidity") 

it is not clear how for a dataset, not dataframe, how I can rename the column like using alias on-the-fly. I tried a few things but to no avail. No errors when trying, but no effect.

I would like "avg_humidity" as col name.

Extending the question, what if I issue:

val ds3 = ds.groupBy($"ip")
            .avg() 

How to handle that?


Solution

  • avg does not provide an alias func you might need an extra withColumnRenamed

    val ds3 = ds.groupBy($"ip")
      .avg("humidity")
      .withColumnRenamed("avg(humidity)","avg_humidity")
    

    instead you can use .agg(avg("humidity").as("avg_humidity"))

    val ds3 = ds.groupBy($"ip").agg(avg("humidity").as("avg_humidity"))