I run a virtual machine with a local instance of Hadoop and of Spark-JobServer on it. I created a file named 'test.txt' on HDFS that I want to open from the Spark-JobServer. I wrote the following code to do this:
val test1 = sc.textFile("hdfs://quickstart.cloudera:8020/test.txt")
val test2 = test1.count
return test2
However, when I want to run these lines, I get an error in the Spark-JobServer:
"Input path does not exist: hdfs://quickstart.cloudera:8020/test.txt"
I looked up the path to HDFS with hdfs getconf -confKey fs.defaultFS
and it showed me hdfs://quickstart.cloudera:8020
as path. Why can I not access the test.txt file if this is the correct path to HDFS? If this is the inccorect path, how can I find the correct path?
Your file is not in the root directory.
You will find your file under hdfs:///user/<your username>/test.txt
When you do a hadoop -put without specifying a location, it will go in your user's home dir, not in the root dir.
check the output of the following to verify this:
hadoop fs -cat test.txt
hadoop fs -cat /test.txt
do hadoop -put 'test.txt' /
and see if your spark code works.