I am trying to create an index on text embeddings for a RAG system with Milvus 2.5.x as vector database in Python. I have already create the collections and populated them. My dataset size is quite small as this is a research project: one collection with 500 rows and another with 53 rows.
My current setup is as follows:
from pymilvus import MilvusClient
client = MilvusClient('../data/task_embeddings.db')
client.load_collection('collection')
client.drop_index('collection', 'problem_statement_embeddings') # Ensure clean precondition before trying to create index
client.describe_index('collection', 'problem_statement_embeddings') # Check whether the last statement worked as expected
index_params = MilvusClient.prepare_index_params()
index_params.add_index(
index_name='problem_statement_embeddings',
field_name="vector",
index_type="FLAT",
metric_type="COSINE",
)
client.create_index('collection', index_params, sync=True)
This code runs through fine. However, when I then try to check the index with client.describe_index('collection', 'problem_statement_embeddings')
I get the following output:
{'index_type': 'FLAT',
'metric_type': 'COSINE',
'dim': '768',
'field_name': 'vector',
'index_name': 'problem_statement_embeddings',
'total_rows': 0,
'indexed_rows': 0,
'pending_index_rows': 0,
'state': 'Finished'}
Indicating that no rows were indexed. If I run a search query, I do still get a result. I suppose at my dataset size it does not matter too much whether the data is indexed, but I would still like to understand what is going on here to ensure that I dont run into unexpected behaviour later.
Edit: I have opened an issue in the Milvus repo
This is just because the FLAT index is a brute-force operation and no indexing is performed, this is expected according to the maintainers. However it is pretty poor UX that this isnt documented.