I have table with S3 JSON as a source partitioned by:
year
month
day
hour
With projection.enabled = true
and standard ranges for these partition keys.
Running query like:
SELECT count(*) FROM my_table WHERE year=2022 and month=10 and day=28 or day=29 or day=30
Took:
How can I predict how will this scale?Initially I expected the time to be constant - I thought Athena would spin up as many "crawlers" as many files there are to be scanned.
Can I predict how will this scale?
While it is very hard to predict how Athena scales, I can say that V3 engine works much faster than V2 engine.