Search code examples
pythonflaskmilvus

Is PyMilvus client thread-safe & fork-safe?


I'm thinking about using Milvus vector storage in my Flask based project and looking at the PyMilvus (Python SDK) documentation. I haven't found any information yet about:

  • Is PyMilvus thread-safe?
  • Is PyMilvus fork-safe?
  • How does connection pooling work in the SDK?

Could you help me to sort it out?

The official documentation doesn't contain too much information.


Solution

  • Currently PyMilvus version(v2.3.x) doesn't provide a thread pool or connection pool. Basically, PyMilvus has a global object "connections" to maintain client-to-server connections.

    User calls connections.connect() To create a connection:

    from pymilvus import (
        connections,
    )
    connections.connect(host=HOST, port=PORT, alias="xxx")
    

    This method has a parameter "alias", it is the name of the connection. The "connections" object internally maintains a map of name-to-connection. If you didn't provide the "alias", it will use "default" as the name of the connection.

    When you declare a collection, there is a parameter "using" to specify a connection name. If you didn't provide the "using", it will use "default" connection. All the interfaces of this Collection will work via this connection.

    collection = Collection(name=collection_name, using="xxx")
    collection.insert()
    collection.search()
    ......
    

    The connection object is thread-safe, which means you can call the collection's interfaces from different threads. But the connection object cannot be shared by multiple sub-processes. So, if you fork a sub-process, you should ensure each subprocess creates its own connection by connections.connect().