Search code examples
amazon-web-servicesamazon-dynamodbetlaws-glue

AWS push_down_predicate not working with DynamoDb


I'm using AWS Glue to read data from a DynamoDB table where the sort key sk (string) is a timestamp in the format 2024-04-10T00:00:00.000000+00:00. I'm trying to apply a push_down_predicate to filter records within a specific time range, but I'm getting unexpected results, including timestamps outside the specified range.

What I've Tried:

  1. DynamoDB Query: When I query directly from DynamoDB using the same timestamp format, the results are as expected.
  2. AWS Glue Job:
dynamic_frame = glueContext.create_dynamic_frame.from_catalog(
   database="my_database",  
   table_name="my_dynamodb_table",  
   push_down_predicate=f"sk >= '{start_timestamp}' AND sk < '{end_timestamp}'"
)
Here, `start_timestamp` and `end_timestamp` match the format in DynamoDB.

Observed Behavior: Instead of getting filtered results within the specified timestamp range, I'm seeing a mix of timestamps, including many outside the range.

Question:

Why isn't the push_down_predicate filtering the DynamoDB data as expected through AWS Glue, and how can I correctly apply this filter to get only the timestamps within the specified range?


Solution

  • DynamoDB connector does not support push down predicate filtering:

    https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect-dynamodb-home.html