I'm working on a Databricks. I'm reading my delta table like this:
path = "/root/data/foo/year=2021/"
df = spark.read.format("delta").load(path)
However within the year=2021
folder there are sub-folders for each day day=01
, day=02
, day=03
, etc...
How can I read folders of day 4,5,6 for example?
edit#1
I'm reading answer from different questions and it seems that the proper way to achieve this is to use a filter applied the partitioned column
Seems the better way to read partitioned delta tables is to apply a filter on the partitions:
df = spark.read.format("delta").load('/whatever/path')
df2 = df.filter("year = '2021' and month = '01' and day in ('04','05','06')")