Search code examples
pysparkaggregateiris-dataset

round results in aggregate table results (pyspark)


Hello how would I round this content of table outputted by this code.

from pyspark.sql.functions import *
exprs = {x: "sum" for x in data2.columns[:4]}
data2.groupBy("Species").agg(exprs).show() 

res I've tried

round(data2.groupBy("Species").agg(exprs),2).show() #not ok

data2.groupBy("Species").agg(exprs).show().round(2) # not ok

Solution

  • round only works on one column. So you have to call it for each column, e.g.

    agg_cols = data2.columns[:4]
    exprs = [sum(col(x)).alias(x) for x in agg_cols]
    aggregated_df = data2.groupBy("Species").agg(*exprs)
    aggregated_df.select(col("Species"), *[round(c, 2) for c in agg_cols]).show()