Search code examples
powershellazureazure-data-lakeu-sql

How we can copy any file within Azure Data Lake Store folders


We already have Move-AzureRmDataLakeStoreItemwhich will move files between folders inside Azure datalake. What I am seeking is to copy files within the datalake without effecting the original file.

The possibilities that I know are-

  1. using USQL to EXTRACT data from sourcefile and then OUTPUT to the destinationfile - but I am trying to copy all sort of files (.gz,.txt,.info,.exe,.msi) and I am not sure if USQL can help me with .gz or .exe or .msi files
  2. using Data Factory to copy data from/to Data Lake store

So, my ask here is do we have anything else at our disposal with which we can perform a copy of files within Azure Data Lake Store?


Solution

  • You have couple of other options,

    1. run distcp on an HDI cluster - Similar to instructions provided here. https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-copy-data-wasb-distcp
    2. use adlcopy if you are copying limited amount of data (saying 10-100's of GB) - https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-copy-data-azure-storage-blob

    Does this suffice please? Or do you want something natively supported by Azure Data Lake Store via its REST APIs?

    Thanks, Sachin Sheth Program Manager, Azure Data Lake.