I am using Hive on HDinsight, and I want to store the output of the job in Azure storage (blob). I tried
INSERT OVERWRITE DIRECTORY 'wasb://[email protected]/'
SELECT name, COUNT(*) as count FROM test
GROUP BY name
ORDER BY count DESC
But this returned error "Error: java.lang.RuntimeException: Error in configuring object". Can you please help me redirect the output of the job to Azure blob storage?
To point to Azure Blob Storage, you need to use the wasb://
or wasbs://
uri prefix, like:
INSERT OVERWRITE DIRECTORY 'wasb://[email protected]/output' ...
This article has lots of examples: http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage/
I think you also need to provide a directory in the path. It looks like INSERT OVERWRITE expects to be able to operate on the directory in a way that is not allowed at the root. Can you try:
INSERT OVERWRITE DIRECTORY 'wasb://[email protected]/output'
SELECT name, COUNT(*) as count FROM test
GROUP BY name
ORDER BY count DESC;
Also, don't forget to terminate the expression with the ;
Lastly, if the above does not work, can you confirm that you have access to the storage account in question from the Hive session by just running:
dfs -ls wasb://[email protected]/;