Search code examples
apache-sparkpyspark

How to extract selected elements from array using "explode" in Pyspark


I have a Spark dataframe spdf with data that looks as follows:

player_name    team_history

John           [{Rangers, Center, Active}, {Blackhawks, Center, Former}, {Kings, Center, Former}],
Bob            [{Devils, Defense, Active}, {Maple Leafs, Defense, Former}, {Canadiens, Defense, Former}]

The schema is:

hockey_schema = StructType([

     StructField("player_name", StringType(), True),

     StructField("team_history", ArrayType(
         StructType([
             StructField("team",      StringType(), True),
             StructField("position",  StringType(), True),
             StructField("status",    StringType(), True),
         ])), True)

   ])

The JSON looks like this:

[{ "player_name" : "John", "team_history" : [ { "team" : "Rangers", "position" : "Center", "status" : "Active" }, { "team" : "Blackhawks", "position" : "Center", "status" : "Former"}, { "team" : "Kings", "position" : "Center", "status" : "Former"} ] },

{ "player_name" : "Bob", "team_history" : [ { "team" : "Devils", "position" : "Defense", "status" : "Active" }, { "team" : "Maple Leafs", "position" : "Defence", "status" : "Former"}, { "team" : "Canadiens", "position" : "Defense", "status" : "Former"} ] }]

I would like to "explode" the contents of the team_history column to create a new dataframe named df_exploded with columns for only team and status that looks like this:

team          status
Rangers       Active
Blackhawks    Former
Kings         Former
Devils        Active
Maple Leafs   Former
Canadiens     Former

How would I create the desired df_exploded dataframe using the explode() function in Pyspark?

Thanks!


Solution

  • It seems that using explode and extracting the values you are interested in from the struct would do the trick:

    df\
      .select(F.explode("team_history").alias("s"))\
      .select("s.team", "s.status")\
      .show()
    
    +-----------+------+
    |team       |status|
    +-----------+------+
    |Rangers    |Active|
    |Blackhawks |Former|
    |Kings      |Former|
    |Devils     |Active|
    |Maple Leafs|Former|
    |Canadiens  |Former|
    +-----------+------+