I am a noobie in Spark and AWS.
I have a DynamoDB table in AWS. I created a Spark cluster on EMR with hive. With hive shell I created external table “RawData” to connect to DynamoDB.
Now when I start spark-shell with DynamoDB dependency jars --jars /usr/share/aws/emr/ddb/lib/emr-ddb-hive.jar,/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar
I can query the table “RawData” with HiveContext
and get result.
But when I submitting my Spark program with spark-submit I see in terminal some spark exception and in the logs I found: "org.apache.spark.sql.AnalysisException: no such table RawData".
This is how I create a cluster: aws emr create-cluster --name MyCluster --release-label emr-4.0.0 --applications Name=Spark Name=Hive ...
Please advice what I’m doing wrong. Lev
I found what was missing in the submit command.
I had to add --files /etc/hive/conf/hive-site.xml
as one of the arguments of spark-submit.