I have a pyspark dataframe
id | events |
---|---|
a0 | a-markets-l1 |
a0 | a-markets-watch |
a0 | a-markets-buy |
c7 | a-markets-z2 |
c7 | scroll_down |
a0 | a-markets-sell |
b2 | next_screen |
I am trying to join events by grouping IDs Here's my python code
df_events_userpath = df_events.groupby('id').agg({ 'events': lambda x: ' '.join(x)}).reset_index()
id | events |
---|---|
a0 | a-markets-l1 a-markets-watch a-markets-buy a-markets-sell |
c7 | a-markets-z2 scroll_down |
b2 | next_screen |
I have tried using collect_set
df.groupBy("id").agg(f.collect_set("events").alias("events"))