Search code examples
azureazure-hdinsightmaprdistcp

How to distcp between a MAPR filesystem and a HDInsight Blob Storage


I'm trying to execute the distcp command below, however it is throwing the exception:

hadoop distcp date_load=201901* wasb://[email protected]/luiz/producao/performance/performance_report

The thrown exception is as follow:

I'm trying to execute the distcp command below, however it is throwing the exception:

hadoop distcp date_load=201901* wasb://[email protected]/luiz/producao/performance/performance_report

The thrown exception is as follow:

19/02/06 13:34:53 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 19/02/06 13:34:53 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 19/02/06 13:34:53 INFO impl.MetricsSystemImpl: azure-file-system metrics system started 19/02/06 13:34:53 ERROR tools.DistCp: Invalid arguments: org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: Container dev3-spark in account clusterdev.blob.core.windows.net not found, and we can't create it using anoynomous credentials. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:938) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:438) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1048) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2693) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:98) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2773) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2755) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:411) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:309) at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:216) at org.apache.hadoop.tools.DistCp.run(DistCp.java:116) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:430) Caused by: org.apache.hadoop.fs.azure.AzureException: Container dev3-spark in account clusterdev.blob.core.windows.net not found, and we can't create it using anoynomous credentials. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:730) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:933) ... 12 more Invalid arguments: org.apache.hadoop.fs.azure.AzureException: Container dev3-spark in account clusterdev.blob.core.windows.net not found, and we can't create it using anoynomous credentials.


Solution

  • You can distcp from your on-premise cluster to your Azure storage account

    % hadoop distCP hdfs://<yourHostName>:9001/user/<yourUser>/<yourDirectory> wasbs://<yourStorageContainer>@<YourStorageAccount>.blob.core.windows.net/<yourDestinationDirectory>/
    

    Hope this helps.