Search code examples
pysparkgroup-by

Pysaprk use groupBy() with colRegex


i try to combine groupBy() with colRegex So i want automaticly group all Columns with surfix "B_" and last column "Prio" aggregate with max.

But however i try it it doesn't works.

df_calc_new = df_calc_new.groupBy((sf.col(colRegex('`B_.*`')) for x in [*df_calc_values_p1.columns])).agg(max(col("Prio")))

Solution

  • you can simply do that in pure python :

    c.startswith("B_") for c in df_calc_values_p1.columns