I am trying to read a delta / parquet in Databricks using the follow code in Databricks
df3 = spark.read.format("delta").load('/mnt/lake/CUR/CURATED/origination/company/opportunities_final/curorigination.presentation.parquet')
However, I'm getting the following error:
A partition path fragment should be the form like `part1=foo/part2=bar`. The partition path: curorigination.presentation.parquet
This seemed very straightforward, but not sure why I'm getting the error
Any thoughts?
The error shows that delta lake thinks that you have wrong partition path naming.
If you have any partition column in your delta table, for example year month day, your path should look like
/mnt/lake/CUR/CURATED/origination/company/opportunities_final/year=yyyy/month=mm/day=dd/curorigination.presentation.parquet
and, you just need to do
df = spark.read.format("delta").load("/mnt/lake/CUR/CURATED/origination/company/opportunities_final")
If you just read it as parquet
, you can just do
df = spark.read.parquet("/mnt/lake/CUR/CURATED/origination/company/opportunities_final")
because you don't need to read the absolute path of the parquet file.