Search code examples
pythonapache-sparkpysparkpie-chartpyspark-pandas

Pie chart for pyspark.pandas.frame.DataFrame


How do generate the same pie chart for pyspark.pandas.frame.DataFrame?
I'm not able to get the legend right.

piefreq=final_psdf['Target'].value_counts()
piefreq.plot.pie()

For pandas.core.frame.DataFrame, I managed to produce my desired pie chart using the following code:

piefreq=final_df['Target'].value_counts()

fig=go.Figure(data=[go.Pie(labels=['Yes (n=' + str(piefreq[1]) +')','No (n=' + str(piefreq[0]) +')'],values=final_df['Target'].value_counts())])
fig.update_layout(title={'text': "<b>Pie chart by target</b>",
                         'y':0.9,
                         'x':0.45,
                         'xanchor': 'center',
                         'yanchor': 'top'})

Solution

  • I have succeeded after transforming pyspark.pandas.series.Series object into pyspark.pandas.frame.DataFrame object using piefreq.reset_index(), as plot can be created both on Series and on DataFrame.

    piefreq = final_psdf['Target'].value_counts()
    psdf_piefreq = piefreq.reset_index()
    fig = psdf_piefreq.plot.pie(
        y="Target",
        names=['Yes (n=' + str(psdf_piefreq['Target'][0]) +')', 'No (n=' + str(psdf_piefreq['Target'][1]) +')']
    )
    fig.update_layout(
        title_text='<b>Pie chart by target</b>',
        title_font=dict(size=16),
        title_x=0.45,
    )