Previously I had a pandas dataframe that I could display as a table in Databricks using:
df.display()
Pandas was updated to v2.0.0. today and I am now getting the following error when I run df.display():
AttributeError: 'DataFrame' object has no attribute 'iteritems'
Anyone know how I can resolve this?
I tried running df.display (without parenthesis) and it gives an output but I am looking for an output in the tabular form.
As a workaround, downgrade to pandas v1.5
%pip install --upgrade pandas==1.5
As of April 4, with pandas 2.0.0, you are not able to convert a Pandas DataFrame to a Spark DataFrame using the command:
spark.createDataFrame(df)
Using the above command leads to the error mentioned in the question:
AttributeError: 'DataFrame' object has no attribute 'iteritems'
The iteritems
function seems to have been removed in pandas 2.0.0. From the changelog of pandas 2.0.0:
Removed deprecated Series.iteritems(), DataFrame.iteritems(), use obj.items instead
While the code written in spark to convert pandas dataframe to a spark dataframe still uses iteritems
/databricks/spark/python/pyspark/sql/pandas/conversion.py in createDataFrame(self, data, schema, samplingRatio, verifySchema)
308 warnings.warn(msg)
309 raise
--> 310 data = self._convert_from_pandas(data, schema, timezone)
311 return self._create_dataframe(data, schema, samplingRatio, verifySchema)
312
/databricks/spark/python/pyspark/sql/pandas/conversion.py in _convert_from_pandas(self, pdf, schema, timezone)
340 pdf[field.name] = s
341 else:
--> 342 for column, series in pdf.iteritems():
343 s = _check_series_convert_timestamps_tz_local(series, timezone)
344 if s is not series:
Looks like we will have to wait for a fix to use Pandas 2.0.0.