Search code examples
amazon-web-servicesdatabricksazure-databricks

Where is Databricks DBFS located?


I have read through the documentation but I don't see much technical detail on DBFS. Is this a hosted service or is it in the client's account? I assume it's not hosted, but I can't find it in my azure account or my aws account. I'm very interested in how this is set up and the technical details I can provide to clients. The most technical detail I can find is that there is a 2 gig file limit.


Solution

  • DBFS is the name for implementation of abstraction around underlying cloud storage, potentially of different types. Usually, when people are referring to the DBFS, it comes to two things:

    1. DBFS Root - the main entry point of DBFS (/, /tmp, etc.). On AWS you need to provision it yourself as S3 Bucket. On Azure it's created during workspace creation as a dedicated & isolated storage account in a separate managed resource group. You can't update settings of that storage account after it's created, or access it directly. That's why it's recommended not to store critical data in the DBFS Root.

    2. Other storage accounts (you can also use S3 or GCS) that are mounted into a workspace. Although it's convenient to work with mounted storage, you need to understand that these mounts are available for all in workspace (except so-called passthrough mount)