Search code examples
azurehadoophiveazure-hdinsight

HDinsight hive output to blob


I am using Hive on HDinsight, and I want to store the output of the job in Azure storage (blob). I tried

INSERT OVERWRITE DIRECTORY 'wasb://[email protected]/'

SELECT name, COUNT(*) as count FROM test
  GROUP BY name
  ORDER BY count DESC

But this returned error "Error: java.lang.RuntimeException: Error in configuring object". Can you please help me redirect the output of the job to Azure blob storage?


Solution

  • To point to Azure Blob Storage, you need to use the wasb:// or wasbs:// uri prefix, like:

    INSERT OVERWRITE DIRECTORY 'wasb://[email protected]/output' ...
    

    This article has lots of examples: http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage/

    I think you also need to provide a directory in the path. It looks like INSERT OVERWRITE expects to be able to operate on the directory in a way that is not allowed at the root. Can you try:

    INSERT OVERWRITE DIRECTORY 'wasb://[email protected]/output'
    
    SELECT name, COUNT(*) as count FROM test
      GROUP BY name
      ORDER BY count DESC;
    

    Also, don't forget to terminate the expression with the ;

    Lastly, if the above does not work, can you confirm that you have access to the storage account in question from the Hive session by just running:

    dfs -ls wasb://[email protected]/;