I am new to Elasticsearch world and I'm working on a project to use Amazon Elasticsearch service (Elasticsearch and Kibana) to provide a log analytics system for all the CloudWatch logs from different AWS accounts. Setting up the stack and routing the CloudWatch logs is the easy part. But I've noticed a good indexing strategy comes to play specially when you have immutable data in a time series fashion (logs in this case). My first approach was to create one daily single index for each log group and use the Index Policy to move/expire old indices based on my requirements. but I figured that I am going to deal with a lot of tiny indices in my Elasticsearch cluster. Then I considered to index all the CloudWatch log groups from each AWS account into a daily single index The problem is that it exceeds the mapping limit (1000 fields) mostly caused by CloudTrail and VPS flow logs and I think it is not a good idea to increase this limit. So I've decided to group my logs into some limited number of index types (e.g. cloudtrail logs, VPC flow logs, and other logs). So basically I would have three daily indices for each AWS account which are relatively larger indices and I won't have to increase the mapping limit. I'm sharing this to see if anybody els has implemented something similar and what are their thoughts. I'm still in the initial phase of the project and I am eagerly looking for suggestions and recommendations.
A good indexing strategy is very subjective and depends on lot of factors like size of each index and how ofter you are going to query it.
Since, here we are taking about cloudwatch logs, you should continue your focus on avoiding lots of smaller size indices. Apart from combining logs of different types, you can also look at combining older indices into weekly or monthly indices. For example, reindex one weeks data into a weekly index at the end of the week. Also, make sure you have a retention period defined and are clearing off any older indices.
You can also considering looking at UltraWarm nodes in Amazon Elasticsearch which provide hot-warm storage architecture which works really well for read only data like logs.