I'm using AWS Glue to read data from a DynamoDB table where the sort key sk (string)
is a timestamp in the format 2024-04-10T00:00:00.000000+00:00
. I'm trying to apply a push_down_predicate
to filter records within a specific time range, but I'm getting unexpected results, including timestamps outside the specified range.
What I've Tried:
dynamic_frame = glueContext.create_dynamic_frame.from_catalog(
database="my_database",
table_name="my_dynamodb_table",
push_down_predicate=f"sk >= '{start_timestamp}' AND sk < '{end_timestamp}'"
)
Here, `start_timestamp` and `end_timestamp` match the format in DynamoDB.
Observed Behavior: Instead of getting filtered results within the specified timestamp range, I'm seeing a mix of timestamps, including many outside the range.
Question:
Why isn't the push_down_predicate
filtering the DynamoDB data as expected through AWS Glue, and how can I correctly apply this filter to get only the timestamps within the specified range?
DynamoDB connector does not support push down predicate filtering:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect-dynamodb-home.html