Search code examples
pythonpysparkdatabricksanalytics

Integration Pyspark ans Python in the same Notebook


I work in the team of Analytics in X company. We use Microsoft Azure - Data Bricks. There we have to use PysPark. Let say, after different chunks we had a final data frame. I have to make use of visualisations based on this data frame. I think the library Seaborn from Python should be more useful that any library from Pyspark for data visualization. Is there a way in which I can integrate both programming lengagues in the same Notebook?

Thanks for your answer.


Solution

  • The Databricks includes extra Python libraries natively, so the Seaborn you have mentioned, will work out of the box, with up-to-date runtime releases. Depending whether you use an ML Databricks runtime or just a regular one, the runtime will include a different set of extra Python lib. You can find the complete list of all of them in the documentation - I am attaching link to currently newest runtime versions (11.3 LTS)

    If you want to see an example Databricks notebook that integrates these libraries, this one should give you some hint on how to start.