Search code examples
azurehiveazure-storageazure-blob-storageazure-hdinsight

How can I add a storage container to Azure with different account?


We are trying to use a common data for more than one Outlook accounts. Lets say data is stored in a container which belongs to [email protected] and I want to read it as [email protected], my friend wants to read from [email protected].

I have common account's storage account name, container name (which is public container) but when I try to read the data using Hive with command below:

CREATE EXTERNAL TABLE deneme (t1 string, t2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'wasb://[email protected]/OUR_DATA.txt';

OR I also try command below

CREATE EXTERNAL TABLE deneme (t1 string, t2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'wasb://[email protected]/OUR_DATA.txt?sig=ACCESS_KEY_OF_CONTAINER';

I get the error below:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException Uploads to to public accounts using anonymous access is prohibited.)

We've tried some methods, we made the container type "Public Blob" it didn't work. We added our accounts to storage accounts default directory and it didn't work also. I tried to load data with PIG it seemed to work, but when I dump, PIG also failed.

A weird thing to me is when I run the code below on Hadoop command line it works perfectly :

hadoop fs -lsr wasb://[email protected]/

output is :

lsr: DEPRECATED: Please use 'ls -R' instead.
-rwxrwxrwx   1  145391417 2015-05-18 10:58 wasb://[email protected]/OUR_DATA.txt
-rwxrwxrwx   1   25634418 2015-05-18 10:44 wasb://[email protected]/OUR_OTHER_DATA.txt

To sum up up our problem is reading data from another Azure account with our Azure accounts, using HDInsight (Hive/PIG/Hadoop).


Solution

  • Does it work if you just point to the folder instead of a specific file? Hive expects locations to be folder paths, not specific files.

    CREATE EXTERNAL TABLE deneme (t1 string, t2 string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
    STORED AS TEXTFILE
    LOCATION 'wasb://[email protected]/';
    

    I was able to create a similar external table against a container configured as a "Public Container".

    If you don't want to use a public container, you can include the storage key in a configuration variable directly in a Hive script like:

    set fs.azure.account.key.storageaccount.blob.core.windows.net=ACCESS_KEY_OF_CONTAINER;
    

    Or you can configure the cluster at provisioning time with access permissions to the storage account using the Additional Storage Accounts section of the custom create wizard, or by using the Add-AzureHDInsightStorage cmdlet to modify the cluster configuration prior to creating the cluster.

    This article has a bunch of related information on the interactions between HDInsight and Azure Blob Storage: http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage/