Search code examples
amazon-web-servicesapache-sparkamazon-emraws-glueaws-glue-data-catalog

AWS Glue Data Catalog, temporary tables and Apache Spark createOrReplaceTempView


According to AWS Glue Data Catalog documentation https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html

Temporary tables are not supported.

It is not clear to me or under Temporary tables I can also consider the Temporary views that can be created in Apache Spark via DataFrame.createOrReplaceTempView method?

So, in other words - I can't use DataFrame.createOrReplaceTempView method with AWS Glue and AWS Glue Data Catalog, am I right? I can only operate with permanent tables/view with AWS Glue and AWS Glue Data Catalog right now and must use AWS EMR cluster for full-featured Apache spark functionality?


Solution

  • You can use DataFrame.createOrReplaceTempView() in AWS Glue. You have to convert dynamicframe to dataframe using toDF().

    But these views will remain in scope of your current glue job instance and won't be accessible from other glue jobs or other instances of same job or athena