Search code examples
apache-spark-sqldatabricks

What does "Determining location of DBIO file fragments..." mean, and how do I speed it up?


When running simple SQL commands in Databricks, sometimes I get the message:

Determining location of DBIO file fragments. This operation can take some time.

What does this mean, and how do I prevent it from having to perform this apparently-expensive operation every time? This happens even when all the underlying tables are Delta tables.


Solution

  • That is a message about the delta cache. It’s determines on which executors it has what cached, to route tasks for best cached locality. Optimizing your table more frequently so there are fewer files will make this better.