Search code examples
apache-sparkpysparkazure-databricksdelta-lake

Unable to read Databricks Delta / Parquet File with Delta Format


I am trying to read a delta / parquet in Databricks using the follow code in Databricks

df3 = spark.read.format("delta").load('/mnt/lake/CUR/CURATED/origination/company/opportunities_final/curorigination.presentation.parquet')

However, I'm getting the following error:

A partition path fragment should be the form like `part1=foo/part2=bar`. The partition path: curorigination.presentation.parquet

This seemed very straightforward, but not sure why I'm getting the error

Any thoughts?

The file structure looks like the following enter image description here


Solution

  • The error shows that delta lake thinks that you have wrong partition path naming.

    If you have any partition column in your delta table, for example year month day, your path should look like

    /mnt/lake/CUR/CURATED/origination/company/opportunities_final/year=yyyy/month=mm/day=dd/curorigination.presentation.parquet
    

    and, you just need to do

    df = spark.read.format("delta").load("/mnt/lake/CUR/CURATED/origination/company/opportunities_final")
    

    If you just read it as parquet, you can just do

    df = spark.read.parquet("/mnt/lake/CUR/CURATED/origination/company/opportunities_final")
    

    because you don't need to read the absolute path of the parquet file.