Is is possible to performa group by taking in all the fields in aggregate?

I am on apache spark 3.3.2. Here is a sample code

val df: Dataset[Row] = ???

df
 .groupBy($"someKey")
 .agg(collect_set(???)) //I want to collect all the columns here including the key.

As mentioned in the comment I want to collect all the columns and not have to specify all the columns again. Is there a way to do this?

Solution

If your intention is to aggregate all elements that match the same key as a list of json objects you can perform something like:

import org.apache.spark.sql.functions._
import org.apache.spark.sql.DataFrame

val df = spark.sqlContext.createDataFrame(Seq(
      ("steak", "1990-01-01", "2022-03-30", 150),
      ("steak", "2000-01-02", "2021-01-13", 180),
      ("fish",  "1990-01-01", "2001-02-01", 100)
    )).toDF("key", "startDate", "endDate", "price")

df.show()

df
 .groupBy("key")
 .agg(collect_set(struct($"*")).as("value"))
 .show(false)

output:

+-----+----------+----------+-----+
|  key| startDate|   endDate|price|
+-----+----------+----------+-----+
|steak|1990-01-01|2022-03-30|  150|
|steak|2000-01-02|2021-01-13|  180|
| fish|1990-01-01|2001-02-01|  100|
+-----+----------+----------+-----+

+-----+----------------------------------------------------------------------------+
|key  |value                                                                       |
+-----+----------------------------------------------------------------------------+
|steak|[{steak, 1990-01-01, 2022-03-30, 150}, {steak, 2000-01-02, 2021-01-13, 180}]|
|fish |[{fish, 1990-01-01, 2001-02-01, 100}]                                       |
+-----+----------------------------------------------------------------------------+