Search code examples
hadoopamazon-web-servicesapache-sparkhiveemr

Spark cannot see hive external table


I am a noobie in Spark and AWS.

I have a DynamoDB table in AWS. I created a Spark cluster on EMR with hive. With hive shell I created external table “RawData” to connect to DynamoDB.

Now when I start spark-shell with DynamoDB dependency jars --jars /usr/share/aws/emr/ddb/lib/emr-ddb-hive.jar,/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar I can query the table “RawData” with HiveContext and get result. But when I submitting my Spark program with spark-submit I see in terminal some spark exception and in the logs I found: "org.apache.spark.sql.AnalysisException: no such table RawData".

This is how I create a cluster: aws emr create-cluster --name MyCluster --release-label emr-4.0.0 --applications Name=Spark Name=Hive ...

Please advice what I’m doing wrong. Lev


Solution

  • I found what was missing in the submit command. I had to add --files /etc/hive/conf/hive-site.xml as one of the arguments of spark-submit.