Search code examples
pythonsolrfastapilangchain

How to use Solr as retriever in RAG


I want to build a RAG (Retrieval Augmented Generation) service with LangChain and for the retriever I want to use Solr. There is already a python package eurelis-langchain-solr-vectorstore where you can use Solr in combination with LangChain but how do I define server credentials? And my embedding model is already running on a server. I thought something like this but I don't know

import requests
from eurelis_langchain_solr_vectorstore import Solr

embeddings_model = requests.post("http://server-insight/embeddings/")


solr = Solr(embeddings_model, core_kwargs={
        'page_content_field': 'text_t',  # field containing the text content
        'vector_field': 'vector',        # field containing the embeddings of the text content
        'core_name': 'langchain',        # core name
        'url_base': 'http://localhost:8983/solr' # base url to access solr
    })  # with custom default core configuration


retriever = solr.as_retriever()

Solution

  • For the first question: For basic credentials you can send them in the url with the login:password@ pattern

    http://localhost:8983/solr => http://login:password@localhost:8983/solr

    For the second one: to use your embeddings server you need to provide the Solr vector store with a class inheriting from langchain_core.embeddings.embeddings.Embeddings

    it must then implement both

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed search docs."""
    

    and

    def embed_query(self, text: str) -> List[float]:
        """Embed query text."""
    

    in both methods you can use your http://server-insight/embeddings/ endpoint.

    First method is used at indexing time and is intended to work with a list of text and return a list of embeddings

    second one is used at query time and is intended to work with a single text and return a single embedding (a single list of float)