I'm trying to create a dynamic glue dataframe from an athena table but I keep getting an empty data frame.
The athena table is part of my glue data catalog
The create_dynamic_frame_method
call doesn't raise any error. I tried loading a random table and it did complain just as a sanity check.
I know the Athena table has data, since querying the exact same table using Athena returns results
The table is an external json, partitioned table on s3
I'm using pyspark as shown below:
import sys
from pyspark.context import SparkContext
from awsglue.context import GlueContext
# Create a Glue context
glueContext = GlueContext(SparkContext.getOrCreate())
# Create a DynamicFrame using the 'raw_data' table
raw_data_df =
glueContext.create_dynamic_frame.from_catalog(database="***",
table_name="raw_***")
# Print out information about this data, im getting zero here
print "Count: ", raw_data_df.count()
#also getting nothing here
raw_data_df.printSchema()
Anyone facing the same issue ? Could this be a permissions issue or a glue bug since no errors are raised?
There are several poorly documented features/gotchas in Glue which is sometimes frustrating.
I would suggest to investigate the following configurations of your Glue job:
I have also written a blog on LinkedIn about other Glue gotchas if that helps.