Search code examples
databasevector-databasemilvus

In version 2.4.0, there are occasional inconsistencies in query results


After building the vector library in version 2.4.0, we sliced the data and then performed queries later on. We found that querying the same issue sometimes yields different results, and occasionally, the results are not the expected answers. How should we troubleshoot this situation? The collection is standalone, and no operations were performed on it.


Solution

    • When Milvus receives an insert request, the data is not immediately searchable because the data first enters the message queue as a write-ahead log. The data becomes searchable only when the QueryNode asynchronously receives it from the message queue. There can be a delay of a few seconds between the data entering the message queue and the QueryNode receiving it. The consistency_level parameter of the search interface can be used to control data visibility.
    • Once the QueryNode receives the data, it doesn’t immediately write the data to disk. Instead, it accumulates in a buffer and exists as a growing segment, which has a temporary index (IVF_FLAT). Typically, the buffer is written to disk after a few minutes. The search results during this period may slightly differ from those obtained after the buffer is written to disk because the sealed segment will rebuild the index.
    • The sealed segment written to disk will also undergo compaction, where multiple small segments are merged into a larger segment, and the merged segment will also rebuild its index. Due to the data merging and index rebuilding, the search results before and after merging may have slight differences.

    Apart from these factors that might cause minor differences in search results, theoretically, the search results should be stable under other circumstances (with unchanged data and search parameters), unless there is a bug.