Search code examples
jupyter-notebookplotlyamazon-emrpyspark

plotly visualization not working in Pyspark kernel on EMR Jupyterhub Notebook


I'm trying to plot graphs using plotly on EMR Jupyterhub Notebook however the graphs are not being rendered in Pyspark kernel. (Note: Python kernel renders the graph just fine)

Sample code I am trying:

data_canada = px.data.gapminder().query("country == 'Canada'")
fig = px.bar(data_canada, x='year', y='pop')
fig.show()

I am able to plot a graph with %%display sparkmagic however I am not able to figure out if we can get plotly working with %%display sparkmagic - 

import random
   data = [('Person:%s' % i, i, random.randint(1, 5)) for i in range(1, 50)]
   columns = ['Name', 'Age', 'Random']
   spark_df = spark.createDataFrame(data, columns)

%%display
spark_df

Has anyone tried this successfully? Please advise.


Solution

  • This is the limitation of sparkmagic. You would have to resort to %%local magic. From sparkmagic docs.

    Since all code is run on a remote driver through Livy, all structured data must be serialized to JSON and parsed by the Sparkmagic library so that it can be manipulated and visualized on the client side. In practice this means that you must use Python for client-side data manipulation in %%local mode.