I am trying to get a simple vector store (chromadb) to embed texts using the add_texts method with langchain, however I get the following error despite successfully using the OpenAI package with a different simple langchain scenario:
ValueError: You must provide embeddings or a function to compute them
Code:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
db = Chroma()
texts = [
"""
One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.
""",
"""
Today's applications are required to be highly responsive and always online. To achieve low latency and high availability, instances of these applications need to be deployed in datacenters that are close to their users. Applications need to respond in real time to large changes in usage at peak hours, store ever increasing volumes of data, and make this data available to users in milliseconds.
""",
]
db.add_texts(texts, embedding_function=OpenAIEmbeddings())
embedding_function need to be passed when you construct the object of Chroma
.
source : Chroma class Class Code
so your code would be:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
db = Chroma(embedding_function=OpenAIEmbeddings())
texts = [
"""
One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.
""",
"""
Today's applications are required to be highly responsive and always online. To achieve low latency and high availability, instances of these applications need to be deployed in datacenters that are close to their users. Applications need to respond in real time to large changes in usage at peak hours, store ever increasing volumes of data, and make this data available to users in milliseconds.
""",
]
db.add_texts(texts)
and result you will see as
['58f12150-2bc4-11ee-9ff5-ac87a32b530e',
'58f12240-2bc4-11ee-9ff5-ac87a32b530e']