I have a json file that is valid:
I can successfully import it on a local spark machine
DF = sqlContext.read.json("/home/me/myfile.json")
I have a shell script to submit the job
/home/me/spark/bin/spark-submit \
--master local[*] Code.py
So far so good, for example DF.show(1) works fine.
Now I am trying to load from a s3a link (which contains exactely the same data as myfile.json).
I have tried
DF = sqlContext.read.json("s3a://some-bucket/myfile.json")
I still run my shell script that contains the same command, i.e.
/home/me/spark/bin/spark-submit \
--master local[*] Code.py
But this time it does not work, I get the following error
java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
Is my shell script wrong?
PS: I just got the s3a link from someone else. So it's not on my AWS account. I assume that I still can import the data from that link even if I do not know any accesskey or secretkey...
Finally I could resolve the issue. By adding the right .jar file (see my comment below) and setting the AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY inside the spark-env.sh which is located in the conf folder of my spark folder.