Search code examples
pythonlangchainsentence-transformerschromadb

AttributeError: 'SentenceTransformer' object has no attribute 'embed_documents' and I cannot modify anything in the library


I'm trying to build a RAG using the Chroma database, but when I try to create it I have the following error : AttributeError: 'SentenceTransformer' object has no attribute 'embed_documents'. I saw that you can somehow fix it by modifying the Chroma library directly, but I don't have the rights for it on my environment. If someone has a piece of an advice, be pleased.

The ultimate goal is to use the index as a query engine for a chatbot. This is what I tried

Code:

#We load the chunks of texts and declare which column is to be embedded
chunks = DataFrameLoader(final_df_for_chroma_injection,
                        page_content_column='TEXT').load()

#create the open-source embedding function
embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L12-v2')
#-Load the persist directory on which are stored the previous embeddings
#-And add the new ones from chunks/embeddings
index = Chroma.from_documents(chunks,
                             embedding_model,
                             persist_directory="./chroma_db")

This is the error I get:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[47], line 3
      1 #-Load the persist directory on which are stored the previous embeddings
      2 #-And add the new ones from chunks/embeddings
----> 3 index = Chroma.from_documents(chunks,
      4                              embedding_model,
      5                              persist_directory="./chroma_db")

File /opt/anaconda3_envs/abeille_pytorch_p310/lib/python3.10/site-packages/langchain_community/vectorstores/chroma.py:778, in Chroma.from_documents(cls, documents, embedding, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
    776 texts = [doc.page_content for doc in documents]
    777 metadatas = [doc.metadata for doc in documents]
--> 778 return cls.from_texts(
    779     texts=texts,
    780     embedding=embedding,
    781     metadatas=metadatas,
    782     ids=ids,
    783     collection_name=collection_name,
    784     persist_directory=persist_directory,
    785     client_settings=client_settings,
    786     client=client,
    787     collection_metadata=collection_metadata,
    788     **kwargs,
    789 )

File /opt/anaconda3_envs/abeille_pytorch_p310/lib/python3.10/site-packages/langchain_community/vectorstores/chroma.py:736, in Chroma.from_texts(cls, texts, embedding, metadatas, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
    728     from chromadb.utils.batch_utils import create_batches
    730     for batch in create_batches(
    731         api=chroma_collection._client,
    732         ids=ids,
    733         metadatas=metadatas,
    734         documents=texts,
    735     ):
--> 736         chroma_collection.add_texts(
    737             texts=batch[3] if batch[3] else [],
    738             metadatas=batch[2] if batch[2] else None,
    739             ids=batch[0],
    740         )
    741 else:
    742     chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)

File /opt/anaconda3_envs/abeille_pytorch_p310/lib/python3.10/site-packages/langchain_community/vectorstores/chroma.py:275, in Chroma.add_texts(self, texts, metadatas, ids, **kwargs)
    273 texts = list(texts)
    274 if self._embedding_function is not None:
--> 275     embeddings = self._embedding_function.embed_documents(texts)
    276 if metadatas:
    277     # fill metadatas with empty dicts if somebody
    278     # did not specify metadata for all texts
    279     length_diff = len(texts) - len(metadatas)

File /opt/anaconda3_envs/abeille_pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py:1688, in Module.__getattr__(self, name)
   1686     if name in modules:
   1687         return modules[name]
-> 1688 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

AttributeError: 'SentenceTransformer' object has no attribute 'embed_documents'```


Solution

  • Use SentenceTransformerEmbeddings instead of SentenceTransformer, or simply HuggingFaceEmbeddings

    Reference > https://python.langchain.com/docs/integrations/text_embedding/sentence_transformers