Search code examples
chromadbvector-database

how to Avoid re computation of embeddings with Chroma


i have a chromadb which have over 50k documents. 1 - i did vectordb.get(include=['embeddings', 'documents', 'metadatas']) to get all the documents, ids, ... stored in the chroma 2 - i did some filtering and i end up with a portion of what .get() returns 3 - i want to create another chroma retreiver but only with the new records the problem is that i can't find a way to not re calculate the embdedings

i tried srearching in the documentation but the only function that creates vector db from lagchain documents is from_documents and the later excpects a embedin function , basicly it wil recalulate them , but i already have them.


Solution

  • You need to include the "embeddings" keyword in your get:

     data = vectordb.get(include=['embeddings', 'documents', 'metadatas', 'embeddings'])
    

    then you can use data['embeddings'] to add those in your new collection:

    collection.add(embeddings=data['embeddings'] ...