i have a chromadb which have over 50k documents.
1 - i did vectordb.get(include=['embeddings', 'documents', 'metadatas'])
to get all the documents, ids, ... stored in the chroma
2 - i did some filtering and i end up with a portion of what .get() returns
3 - i want to create another chroma retreiver but only with the new records
the problem is that i can't find a way to not re calculate the embdedings
i tried srearching in the documentation but the only function that creates vector db from lagchain documents is from_documents
and the later excpects a embedin function , basicly it wil recalulate them , but i already have them.
You need to include the "embeddings" keyword in your get:
data = vectordb.get(include=['embeddings', 'documents', 'metadatas', 'embeddings'])
then you can use data['embeddings']
to add those in your new collection:
collection.add(embeddings=data['embeddings'] ...