I'm using CloudSearch to index a large number of small json data objects which need to be updated regularly (I have a 5 minute cron job) based on the value of an expression and some business logic.
Everything was working until last week. I see the cron job is still running without encountering any error messages, the objects in S3 are still being updated correctly, but when I execute a CloudSearch request ordered by the ranking property, I get stale search data. Not just by a couple of minutes, but by weeks.
I tried re-indexing but that did not result in any change. Does CloudSearch have some sort of update threshold which prevents you from posting updates after a certain number of requests a day? I imagine updating once every 5 minutes would fall well below such a number.
I haven't been able to find any indication in AWS's docs as to whether they do some sort of update throttling
What I ended up finding was a mention in CloudSearch's FAQ: http://aws.amazon.com/cloudsearch/faqs/
Q: How much data can I upload to my search domain?
The number of partitions you need depends on your data and configuration, so the maximum data >you can upload is the data set that when your search configuration is applied results in 10 >search partitions. When you exceed your search partition limit, your domain will stop >accepting uploads until you delete documents and re-index your domain. If you need more than >10 search partitions, please contact us.
I deleted a large amount of data from CloudSearch that I was no longer using and found my updates working again. I had unwittingly run into a data limit in CloudSearch's index. CloudSearch was still returning a success when I submitted the batch update, but it silently ignored the update operations.