Search code examples
pysparkdatabricksdelta-lakepyspark-pandas

'DataFrame' object has no attribute 'to_delta'


My code used to work. Why does my code not work anymore? I updated to the newer Databricks runtime 10.2 so I had to change some earlier code to use pandas on pyspark.

# Drop customer ID for AutoML
automlDF = churn_features_df.drop(key_id)

# Write out silver-level data to autoML Delta lake
automlDF.to_delta(mode='overwrite', path=automl_silver_tbl_path)

The error I am getting is 'DataFrame' object has no attribute 'to_delta'


Solution

  • I was able to get it to work as expected using to_pandas_on_spark(). My working code looks like this:

    # Drop customer ID for AutoML
    automlDF = churn_features_df.drop(key_id).to_pandas_on_spark()
    
    # Write out silver-level data to autoML Delta lake
    automlDF.to_delta(mode='overwrite', path=automl_silver_tbl_path)