I have this Dataframe :
+---------+
| data|
+---------+
|[a, b, c]|
|[d, e, f]|
|[g, h, i]|
+---------+
And a list of column name ["first col", "second col", "third col"]
I want to create new columns to produce the following dataframe :
+-----------+-----------+----------+
| first col| second col| third col|
+-----------+-----------+----------+
| a| b| c|
| d| e| f|
| g| h| i|
+-----------+-----------+----------+
I'm scratching my head on how to do that, what would be the correct way to achieve this?
Untested code but the idea is to just use getItem()
to access the ith element of the data
column which in your case is a list, and store them in new columns created with withColumn
df = spark.createDataFrame([(['a', 'b', 'c'],), (['d', 'e', 'f'],), (['g', 'h', 'i'],)], ['data'])
col_names = ['first col', 'second col', 'third col']
for i, name in enumerate(col_names):
df = df.withColumn(name, col('data').getItem(i))
df = df.drop('data')