I am trying to convert a spark dataframe to pandas dataframe on Azure databricks. But I get the following error:
Exception: arrow is not supported when using file-based collect
I have tried the reference code using the link: https://docs.databricks.com/spark/latest/spark-sql/spark-pandas.html
First I read csv file using the following line:
#read file
df1 = spark.read.csv('/mnt/test/sample.csv', header = True)
Next I try to convert this to pandas dataframe using this code below:
# Enable Arrow-based columnar data transfers
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
# Convert the Spark DataFrame to a Pandas DataFrame
pandas_df = df1.select("*").toPandas()
But while doing this I get this error: Exception: arrow is not supported when using file-based collect
Here is the full expansion of the error message:
Exception: arrow is not supported when using file-based collect
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<command-3700677352136413> in <module>()
2 spark.conf.set("spark.sql.execution.arrow.enabled", "true")
3 # Convert the Spark DataFrame to a Pandas DataFrame
----> 4 pandas_df = df1.select("*").toPandas()
/databricks/spark/python/pyspark/sql/dataframe.py in toPandas(self)
2169 _check_dataframe_localize_timestamps
2170 import pyarrow
-> 2171 batches = self._collectAsArrow()
2172 if len(batches) > 0:
2173 table = pyarrow.Table.from_batches(batches)
/databricks/spark/python/pyspark/sql/dataframe.py in _collectAsArrow(self)
2225 """
2226 if self._sc._conf.get(self._sc._jvm.PythonSecurityUtils.USE_FILE_BASED_COLLECT()):
-> 2227 raise Exception("arrow is not supported when using file-based collect")
2228 with SCCallSiteSync(self._sc) as css:
2229 sock_info = self._jdf.collectAsArrowToPython()
Exception: arrow is not supported when using file-based collect
Can someone please help?
I finally figured out the solution. It was the Runtime version of the cluster that had to be changed. Created a new cluster and tested with it Runtime Version 5.5 and it seemed to run fine.