Search code examples
pyspark

Pyspark: how to filter rows for multiple criteria?


Say i have a dataset such as this enter image description here | game_name | developer_name | | -------- | -------------- | | X | John | | Y | Mark |[![enter image description here][2]][2] | X | Mark | | Z | John | | Y | John| How do i find the list of all game_name s that two developers both worked on (eg:all game names that both John and Mark worked on)?


Solution

  • groupby the game_name col and then list the developer_name column

    df.groupby(['game_name'])['developer_name'].apply(list).reset_index()