I would like to know what is the best way to load a delta table specific partition ? Is option 2 loading the all table before filtering ?
df = spark.read.format("delta").option('basePath','/mnt/raw/mytable/')\
.load('/mnt/raw/mytable/ingestdate=20210703')
(Is the basePath option needed here ?)
df = spark.read.format("delta").load('/mnt/raw/mytable/')
df = df.filter(col('ingestdate')=='20210703')
Many thanks in advance !
In the second option, spark loads only the relevant partitions that has been mentioned on the filter condition, internally spark does partition pruning
and load only the relevant data from source table.
Whereas in the first option, you are directly instructing spark to load only the respective partitions as defined.
So in both the cases, you will end up loading only the respective partitions data.