I want to build a RAG (Retrieval Augmented Generation) service with LangChain and for the retriever I want to use Solr.
There is already a python package eurelis-langchain-solr-vectorstore
where you can use Solr in combination with LangChain but how do I define server credentials? And my embedding model is already running on a server. I thought something like this but I don't know
import requests
from eurelis_langchain_solr_vectorstore import Solr
embeddings_model = requests.post("http://server-insight/embeddings/")
solr = Solr(embeddings_model, core_kwargs={
'page_content_field': 'text_t', # field containing the text content
'vector_field': 'vector', # field containing the embeddings of the text content
'core_name': 'langchain', # core name
'url_base': 'http://localhost:8983/solr' # base url to access solr
}) # with custom default core configuration
retriever = solr.as_retriever()
For the first question: For basic credentials you can send them in the url with the login:password@ pattern
http://localhost:8983/solr => http://login:password@localhost:8983/solr
For the second one: to use your embeddings server you need to provide the Solr vector store with a class inheriting from langchain_core.embeddings.embeddings.Embeddings
it must then implement both
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Embed search docs."""
and
def embed_query(self, text: str) -> List[float]:
"""Embed query text."""
in both methods you can use your http://server-insight/embeddings/ endpoint.
First method is used at indexing time and is intended to work with a list of text and return a list of embeddings
second one is used at query time and is intended to work with a single text and return a single embedding (a single list of float)