Search code examples
pythonpysparkgroup-byaggregate

Aggregate the grouped values on many specific columns from list


I'd like to perfom groupBy() operation with specific agg().

df = df.groupBy("x", "y").agg(F.max("a").alias("a"), F.max("b").alias("b"))

But is there any way to aggregation using list of columns? I don't want to hardcode it.


Solution

  • You can use list comprehension.

    list_of_cols = ["a", "b"]
    df = df.groupBy("x", "y").agg(*[F.max(x).alias(x) for x in list_of_cols])