Search code examples
amazon-web-servicesapache-sparkaws-glueaws-glue-data-catalog

AWS Glue with Athena


We are in a phase where we are migrating all of our spark job written in scala to aws glue.

Current Flow: Apache Hive -> Spark(Processing/Transformation) -> Apache Hive -> BI

Required Flow: AWS S3(Athena) -> Aws Glue(Spark Scala -> Processing/Transformation) -> AWS S3 -> Athena -> BI

TBH i got this task yesterday and i am doing R&D on it. My questions are :

  1. Can we run same code in apache glue as it has dynamic frame which can be converted to dataframes but require changes in code.
  2. Can we read data from aws athena using spark sql api in aws glue like we normally do in spark.

Solution

  • I am able to run my current code with minor changes. i have built sparkSession and use that session to query glue hive enabled catalog table. we need to add this parameter in our job --enable-glue-datacatalog

    SparkSession.builder().appName("SPARK-DEVELOPMENT").getOrCreate()
    var sqlContext = a.sqlContext
    sqlContext.sql("use default")
    sqlContext.sql("select * from testhive").show()