We are in a phase where we are migrating all of our spark job written in scala to aws glue.
Current Flow: Apache Hive -> Spark(Processing/Transformation) -> Apache Hive -> BI
Required Flow: AWS S3(Athena) -> Aws Glue(Spark Scala -> Processing/Transformation) -> AWS S3 -> Athena -> BI
TBH i got this task yesterday and i am doing R&D on it. My questions are :
I am able to run my current code with minor changes. i have built sparkSession and use that session to query glue hive enabled catalog table. we need to add this parameter in our job --enable-glue-datacatalog
SparkSession.builder().appName("SPARK-DEVELOPMENT").getOrCreate()
var sqlContext = a.sqlContext
sqlContext.sql("use default")
sqlContext.sql("select * from testhive").show()