Search code examples
pythonopenai-apilangchain

How to provide embedding function to a langchain vector store


I am trying to get a simple vector store (chromadb) to embed texts using the add_texts method with langchain, however I get the following error despite successfully using the OpenAI package with a different simple langchain scenario:

ValueError: You must provide embeddings or a function to compute them

Code:

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma

db = Chroma()

texts = [
    """
    One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.
    """,
    """
    Today's applications are required to be highly responsive and always online. To achieve low latency and high availability, instances of these applications need to be deployed in datacenters that are close to their users. Applications need to respond in real time to large changes in usage at peak hours, store ever increasing volumes of data, and make this data available to users in milliseconds.
""",

]

db.add_texts(texts, embedding_function=OpenAIEmbeddings())

Solution

  • embedding_function need to be passed when you construct the object of Chroma. source : Chroma class Class Code

    so your code would be:

    from langchain.embeddings.openai import OpenAIEmbeddings
    from langchain.vectorstores import Chroma
    
    db = Chroma(embedding_function=OpenAIEmbeddings())
    
    texts = [
        """
        One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.
        """,
        """
        Today's applications are required to be highly responsive and always online. To achieve low latency and high availability, instances of these applications need to be deployed in datacenters that are close to their users. Applications need to respond in real time to large changes in usage at peak hours, store ever increasing volumes of data, and make this data available to users in milliseconds.
    """,
    
    ]
    db.add_texts(texts)
    

    and result you will see as

    ['58f12150-2bc4-11ee-9ff5-ac87a32b530e',
     '58f12240-2bc4-11ee-9ff5-ac87a32b530e']