python langchain chromadb vector-database

How can I save a dictonary of chrroma db which has vector embeddings to avoid computation again?

I am generating chromba db which has vector embeddings for pdf different documents and I want to store them to avoid re computation every time for different inputs. Pickling and Json serialization does not seem to work for chroma object, importing from another file also makes the embedding script run again.

Solution

You are able to pass a persist_directory when using ChromaDB with Langchain

persist_directory = 'db'

embedding = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents=texts, embedding=embedding, persist_directory=persist_directory)

This will store the embedding results inside a folder named db

The next time you need to access the db simply load it from memory like so

vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)