I have an endpoint that I'm building that will be used by 5k - 10k users, say:
GET /magic/spells?spellIds=1,2,3
The underlying data source for 'spells' changes frequently and is stored in a compressed format on s3. The size of the underlying data is fairly small (< 100mb).
I'm querying directly against Athena each time the endpoint is hit by a user in order to retrieve the spell data to be returned in the api response. I'm using a cache to help reduce the number of raw Athena queries, but I'm still concerned that the approach may not scale well.
Athena queries aren't free, and if a user decides to query for randomly generated spellIds
to avoid the cache, it could rack up a bill pretty quickly.
Alternatively, I could load the spell data from Athena into a more traditional data source (a database) on a scheduled interval, and query against that instead. The downside to this is that the data may fall out of date, which is a trade-off I may have to make.
Is querying Athena in a user-facing endpoint a common practice, or would it be preferable to store the data in a more traditional data store first?
Athena is designed as an analytical query engine and not a lookup service.
Load the data into DynamoDB and then query it based on the key. You can use other data stores; however, DynamoDB is the most cost-effective.