Search code examples
scaladatabricksazure-databricksspark-koalas

Impossible to import koalas in scala notebook


It seems basic but from what I see on databricks website, nothing works on my side

I have installed koalas package on my cluster But when I try to import the package in my Scala notebook, I have issue.

command-3313152839336470:1: error: not found: value databricks
import databricks.koalas

If I do it in Python, everything works fine

Details cluster & notebook

Thanks for your help Matt


Solution

  • Koalas is a Python package, which mimics the Pandas (another Python package) interfaces. Currently no Scala version is published, even though the project may contain some Scala code. The goal of Koalas is to provide a drop-in replacement for Pandas, to make use of the distributed nature of Apache Spark. Since Pandas is only available on Python I don't expect a direct of port on this in Scala.

    https://github.com/databricks/koalas

    For Scala your best bet is to use the DataSet and DataFrame APIs of Spark: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Dataset.html https://databricks.com/blog/2016/01/04/introducing-apache-spark-datasets.html