Search code examples
spring-bootamazon-athenaendpoint

Endpoint Design: Querying AWS Athena instead of a database


I have an endpoint that I'm building that will be used by 5k - 10k users, say:

GET /magic/spells?spellIds=1,2,3

The underlying data source for 'spells' changes frequently and is stored in a compressed format on s3. The size of the underlying data is fairly small (< 100mb).

I'm querying directly against Athena each time the endpoint is hit by a user in order to retrieve the spell data to be returned in the api response. I'm using a cache to help reduce the number of raw Athena queries, but I'm still concerned that the approach may not scale well.

Athena queries aren't free, and if a user decides to query for randomly generated spellIds to avoid the cache, it could rack up a bill pretty quickly.

Alternatively, I could load the spell data from Athena into a more traditional data source (a database) on a scheduled interval, and query against that instead. The downside to this is that the data may fall out of date, which is a trade-off I may have to make.

Is querying Athena in a user-facing endpoint a common practice, or would it be preferable to store the data in a more traditional data store first?


Solution

  • Athena is designed as an analytical query engine and not a lookup service.

    Load the data into DynamoDB and then query it based on the key. You can use other data stores; however, DynamoDB is the most cost-effective.