Search code examples
pythondictionaryapache-sparkpyspark

Convert pyspark.sql.dataframe.DataFrame type Dataframe to Dictionary


I have a pyspark Dataframe and I need to convert this into python dictionary.

Below code is reproducible:

from pyspark.sql import Row
rdd = sc.parallelize([Row(name='Alice', age=5, height=80),Row(name='Alice', age=5, height=80),Row(name='Alice', age=10, height=80)])
df = rdd.toDF()

Once I have this dataframe, I need to convert it into dictionary.

I tried like this

df.set_index('name').to_dict()

But it gives error. How can I achieve this


Solution

  • You need to first convert to a pandas.DataFrame using toPandas(), then you can use the to_dict() method on the transposed dataframe with orient='list':

    df.toPandas().set_index('name').T.to_dict('list')
    # Out[1]: {u'Alice': [10, 80]}