During indexing, is it possible to respond 200-OK to the client without syncing the data between the primary and replica shard?

I'm looking for an answer to the following question: "During indexing, is it possible to respond 200-OK to the client without syncing the data between the primary and replica shard?" ChatGPT/OpsGPT says you can use index.write.wait_for_active_shards parameter for that purpose. But when I test it with an index with one primary and a replica, in the HTTP response both replica and primary shard return successful which means the write operation get acknowledgment from both of them. (If you can test your end please make sure the elasticsearch has at least 2 data nodes) I researched and read lots of articles like tracking-in-sync-shard-copies but still can't find the answer to the above question. The main concern is tuning for indexing speed and answering the indexer(client) without waiting for acknowledgment from the replica shard. As a destination point, I'm trying to reach the indexing speed when there are no replicas. Please note that: I am aware that this may result in data loss and the official documentation of tune for indexing speed.

I researched and read lots of articles like tracking-in-sync-shard-copies but still can't find the answer to the above question.

Solution

ChatGPT is technically correct here. But its answer requires few clarifications. First of all, code 200 is only sent back when you update an existing document. When document with the given id is created for the first time you get back 201.

But leaving technicalities aside and assuming that you meant 2xx range codes, the answer is yes if and only if not all your replicas were available when the primary started processing the request. At the beginning of the operation, the primary shard checks which replicas are available and will wait for replication to complete on all these replicas. There are rules that governs what happens if not all replicas are available and these rules are configurable by the index.write.wait_for_active_shards setting that you linked. That's why you had to insists on making "sure the elasticsearch has at least 2 data nodes". If you had just one node you would have gotten back the 2xx response from just a single primary, which would have answered your question.

I think the behavior that you are seeking is async replication. It was available in elasticsearch prior to v2.0.0. However, it was later removed since it was causing more harm than good and was preventing the team from implementing new innovative features. Turning on asynchronous replication doesn't solve the issue with server throughput. The servers still have to do the same amount of work and if they cannot keep up they will eventually slow everything down anyway. So, whatever problem you are currently having with the performance needs to be solved in some other way.