transform dataframe: Do you know how group string columns in pyspark?

I am currently working with the next dataframe:

A	B	C	D	E
1	2	some	null	something A
1	2	some	something B	null

And I need the following output:

A	B	C	D	E
1	2	some	something B	something A

My problem is that I can't made a groupBy using string columns.

I tried using self joining and pivot.

Solution

What about something like this?

from pyspark.sql import functions as F

cols_for_groupby = ["A", "B", "C"]
(
    df
    .groupby(cols_for_groupby)
    .agg(*[
        F.max(c).alias(c)
        for c in df.columns if c not in cols_for_groupby
    ])
)

If df is the DataFrame in the question, the result is:

+---+---+----+-----------+-----------+
|  A|  B|   C|          D|          E|
+---+---+----+-----------+-----------+
|  1|  2|some|something B|something A|
+---+---+----+-----------+-----------+