Search code examples
amazon-web-servicesaws-glueamazon-athena

How does AWS Athena scale with data scanned size?


I have table with S3 JSON as a source partitioned by:

year
month
day
hour

With projection.enabled = true and standard ranges for these partition keys. Running query like:

SELECT count(*) FROM my_table WHERE year=2022 and month=10 and day=28 or day=29 or day=30

Took:

  • 8 seconds for one day,
  • 25 seconds for two days,
  • 48 seconds for three days

How can I predict how will this scale?Initially I expected the time to be constant - I thought Athena would spin up as many "crawlers" as many files there are to be scanned.

Can I predict how will this scale?


Solution

  • While it is very hard to predict how Athena scales, I can say that V3 engine works much faster than V2 engine.