Search code examples
javascalaazureazure-hdinsight

Read a file in azure wasbs through scala application


My cluster should read some input files that are located in my azure storage. I am submitting my .jar to the cluster through livy but it always dies because I cannot locate my files -> User class threw exception: java.io.FileNotFoundException. What am I missing? I dont want to use sc.textFile to open the files because it would make them into RDD structures and I need their structure correct.

val Inputs : String = scala.io.Source.fromFile("wasbs:///inputs.txt").mkString

I believe that I am trying to read from the wrong locationo or with the wrong method, any ideas?

Thanks!


Solution

  • According to your description, based on my understanding, I think you want to load the plain text file on Azure Storage using Scala running on HDInsight.

    Per my experience, there are two ways which you can try to implement your needs.

    1. Just using Scala within Azure Java Storage SDK to get the content of the text blob, please refer to the tutorial How to use Blob storage from Java, and I think using Scala to rewrite the sample code in the tutorial is very simple.

    2. Using Hadoop Filesystem API within Hadoop Azure Support library to load file data, please refer to the hadoop example wiki https://wiki.apache.org/hadoop/HadoopDfsReadWriteExample to write your code in Scala.