Search code examples
apache-sparkapache-spark-sqlpartitioningorc

Spark DataFrame partition pruning on ORC files


We have a DataFrame with Transaction Date column which is timestamp.

When we write the DF as ORC files we applied the partition logic on Transaction Date value ( not timestamp only date value), we created a separate field only for applying partition on that field.

If we read the ORC files again with where condition as Transaction Date(timestamp) value, will it prune the partitions?


Solution

  • No. You need to reference the "separate" field appropriately. It stands to reason and is a fundamental DB rule wrt partition pruning.