Search code examples
pythonlangchainchromadbvector-database

How can I save a dictonary of chrroma db which has vector embeddings to avoid computation again?


I am generating chromba db which has vector embeddings for pdf different documents and I want to store them to avoid re computation every time for different inputs. Pickling and Json serialization does not seem to work for chroma object, importing from another file also makes the embedding script run again.


Solution

  • You are able to pass a persist_directory when using ChromaDB with Langchain

    persist_directory = 'db'
    
    embedding = OpenAIEmbeddings()
    vectordb = Chroma.from_documents(documents=texts, embedding=embedding, persist_directory=persist_directory)
    

    This will store the embedding results inside a folder named db

    The next time you need to access the db simply load it from memory like so

    vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)