Consider I have very large files (~25 millions) records of the next objects:
{
"name": "bob",
"age": 26,
"address": "...",
"identifier": "...."
}
I want to be able to index for example by address for better filtering & searching
Q: as I understand, AWS Athena doesn't have indexes machnisem, is that correct?
Q: I want to be able to use Glue indexes mechanism, partition indexes is the only way?
Q: I know that partition in Glue can be achieve by date for example, or states, how it can I index for example the address key? it can be achieve in Glue?
Thanks!
AWS-Athena is just a query engine so, it doesn't have an index mechanism.
AWS-Glue has a partition-index mechanism but there are some restrictions;
But be careful, Athena query scripts have character limitation so if you are going to use these tables from Glue jobs and push_down_predicate feature then you might exceed this limit and the job can fail.
Partition-Index creation:
{partition_filtering.enabled:true}
It will be ready in a couple of minutes. From now on, your Athena queries will be faster if your query has a where condition with the relevant partition-index column. Since, all the partitions will already be listed on the background by catalog. Otherwise, all partitions will be listed first for every query even if you enter a partition column in your where condition.
Thanks.