Search code examples
databricksazure-databricks

Azure Databricks - Export and Import DBFS filesystem


We have just created a new Azure Databricks resource into our resource group. In the same resource group there is an old instance of Azure Databricks. Starting from this old Databricks instance, I would copy the data stored in dbfs into the newest Databricks instance. How could I do that? My idea is to use FS commands in order to copy or move data from a dbfs to another, probably mounting the volumes, but I am not getting how could I do that. Do you have any indications?

Thanks, Francesco


Solution

  • Unfortunately, there is no direct method to export and import files/folders from one workspace to another workspace.

    Note: It's is highly recommended: Do not Store any Production Data in Default DBFS Folders

    enter image description here

    How to copy files/folders from one workspace to another workspace?

    You need to manually download files/folders from one workspace and upload files/folders to another workspace.

    The easiest way is to using DBFS Explorer:

    enter image description here

    enter image description here

    Click this link to view: https://i.sstatic.net/umF9y.jpg

    Download file/folder from DBFS to the local machine:

    Method1: Using Databricks CLI

    The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:

    # List files in DBFS
    dbfs ls
    # Put local file ./apple.txt to dbfs:/apple.txt
    dbfs cp ./apple.txt dbfs:/apple.txt
    # Get dbfs:/apple.txt and save to local file ./apple.txt
    dbfs cp dbfs:/apple.txt ./apple.txt
    # Recursively put local dir ./banana to dbfs:/banana
    dbfs cp -r ./banana dbfs:/banana
    

    Reference: Installing and configuring Azure Databricks CLI and Azure Databricks – Access DBFS

    Method2: Using third-party tool named DBFS Explorer

    DBFS Explorer was created as a quick way to upload and download files to the Databricks filesystem (DBFS). This will work with both AWS and Azure instances of Databricks. You will need to create a bearer token in the web interface in order to connect.

    enter image description here

    Upload file/folder from the local machine to DBFS:

    There are multiple ways to upload files from a local machine to the Azure Databricks DBFS folder.

    Method1: Using the Azure Databricks portal.

    enter image description here

    Method2: Using Databricks CLI

    The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:

    # List files in DBFS
    dbfs ls
    # Put local file ./apple.txt to dbfs:/apple.txt
    dbfs cp ./apple.txt dbfs:/apple.txt
    # Get dbfs:/apple.txt and save to local file ./apple.txt
    dbfs cp dbfs:/apple.txt ./apple.txt
    # Recursively put local dir ./banana to dbfs:/banana
    dbfs cp -r ./banana dbfs:/banana
    

    enter image description here

    Method3: Using third-party tool named DBFS Explorer

    DBFS Explorer was created as a quick way to upload and download files to the Databricks filesystem (DBFS). This will work with both AWS and Azure instances of Databricks. You will need to create a bearer token in the web interface in order to connect.

    Step1: Download and install DBFS Explorer and install it.

    Step2: Open DBFS Explorer and Enter: Databricks URL and Personal Access Token

    enter image description here

    Step3: Select the folder where you want to upload the files from the local machine and just drag and drop in the folder to upload and click upload.

    enter image description here