Search code examples
azurepysparkdatabricksazure-databricksazure-data-lake

How to transfer all the contents in one azure data lake container to another using databricks?


I have a container called test-container, and I'd like to move all the files and folders in test-container over to test2-container. How can I do this in a databricks notebook using pyspark?


Solution

  • You will need to mount both containers, assuming that containers are not public, (/ if it the root folder) then use dbfs cli to move files/folders between the mount points created before.

    dbfs mv /mnt/folder1 /mnt/folder2.

    If you change the access level of the containers to "Container (anonymous read access for containers and blobs)", you should be able to move files directly without even creating mounts.

    In Databricks notebook the code should be something like this -

    %fs mv /mnt/folder1 /mnt/folder2