Search code examples
databricksdatabricks-unity-catalog

Need for volumes in Databricks


Why do we need Volumes when we can access the location using external locations? The doc says that it is to add governance, but we can already govern using external locations. So, why add another layer of governance? I am guessing that instead of giving access to the entire external location, we can provide access to a specific subfolder of the location using volumes.


Solution

  • External Locations are the administrative component while Volumes are the front-facing object that final users use. Volumes are at the same level as Tables while External Locations are one level below.

    External locations map an authentication mechanism (Storage Credentials) with a specific ADLS container/path. This is a task that an administrator should setup. Then, the administrator gives permissions to the final users to use the External Location to create Volumes or Tables. A Volume is used for unstructured or semistructured data while a table is used for tabular data.

    (I work for Databricks)