python pandas apache-spark pyspark databricks

Anyone know how to display a pandas dataframe in Databricks?

Previously I had a pandas dataframe that I could display as a table in Databricks using:

df.display()

Pandas was updated to v2.0.0. today and I am now getting the following error when I run df.display():

AttributeError: 'DataFrame' object has no attribute 'iteritems'

Anyone know how I can resolve this?

I tried running df.display (without parenthesis) and it gives an output but I am looking for an output in the tabular form.

Solution

As a workaround, downgrade to pandas v1.5

%pip install --upgrade pandas==1.5

The answers provided till now used to work prior to 3rd April 2023.

As of April 4, with pandas 2.0.0, you are not able to convert a Pandas DataFrame to a Spark DataFrame using the command:

spark.createDataFrame(df)

Using the above command leads to the error mentioned in the question:

AttributeError: 'DataFrame' object has no attribute 'iteritems'

The iteritems function seems to have been removed in pandas 2.0.0. From the changelog of pandas 2.0.0:

Removed deprecated Series.iteritems(), DataFrame.iteritems(), use obj.items instead

While the code written in spark to convert pandas dataframe to a spark dataframe still uses iteritems

/databricks/spark/python/pyspark/sql/pandas/conversion.py in createDataFrame(self, data, schema, samplingRatio, verifySchema)
    308                     warnings.warn(msg)
    309                     raise
--> 310         data = self._convert_from_pandas(data, schema, timezone)
    311         return self._create_dataframe(data, schema, samplingRatio, verifySchema)
    312 

/databricks/spark/python/pyspark/sql/pandas/conversion.py in _convert_from_pandas(self, pdf, schema, timezone)
    340                             pdf[field.name] = s
    341             else:
--> 342                 for column, series in pdf.iteritems():
    343                     s = _check_series_convert_timestamps_tz_local(series, timezone)
    344                     if s is not series:

Looks like we will have to wait for a fix to use Pandas 2.0.0.