Search code examples
pythonmatplotlibpysparkapache-zeppelin

pyspark matplotlib integration with Zeppelin


I'm trying to draw histogram using pyspark in Zeppelin notebook. Here is what I have tried so far,

%pyspark

import matplotlib.pyplot as plt
import pandas
...
x=dateDF.toPandas()["year(CAST(_c0 AS DATE))"].values.tolist()
y=dateDF.toPandas()["count(year(CAST(_c0 AS DATE)))"].values.tolist()
plt.plot(x,y)
plt.show()

This code run without no errors but this does not give the expected plot. So I googled and found this documantation, enter image description here

According to this, I tried to enable angular flag as follows,

x=dateDF.toPandas()["year(CAST(_c0 AS DATE))"].values.tolist()
y=dateDF.toPandas()["count(year(CAST(_c0 AS DATE)))"].values.tolist()
plt.close()
z.configure_mpl(angular=True,close=False)
plt.plot(x,y)
plt.show()

But now I'm getting an error called No module named 'mpl_config' and I have no idea how to enable angular without this. If you can suggest how to resolve this it will be greatly appriciated


Solution

  • In Zeppelin 0.10.0 I was able to plot a matplotlib plot as simply as this in a %pyspark interpreter:

    import matplotlib.pyplot as plt
    
    x = list(range(10))
    y = list(map(lambda x: x*25, x))
    
    plt.close()  # Close any existing plot when re-running this paragraph.
    plt.xlabel('x', fontsize=20)
    plt.ylabel('y', fontsize=20)
    plt.grid()
    plt.title('Inline plotting example', fontsize=20)
    plt.plot(x,y)
    plt.show()
    

    Output in Zeppelin:

    enter image description here