Search code examples
pysparkdatabricksazure-databricksazure-data-lake

Read Delta table from multiple folders


I'm working on a Databricks. I'm reading my delta table like this:

path = "/root/data/foo/year=2021/"
df = spark.read.format("delta").load(path)

However within the year=2021 folder there are sub-folders for each day day=01, day=02, day=03, etc...

How can I read folders of day 4,5,6 for example?

edit#1

I'm reading answer from different questions and it seems that the proper way to achieve this is to use a filter applied the partitioned column


Solution

  • Seems the better way to read partitioned delta tables is to apply a filter on the partitions:

    df = spark.read.format("delta").load('/whatever/path')
    df2 = df.filter("year = '2021' and month = '01' and day in ('04','05','06')")