Search code examples
pythonopenai-apilangchainlarge-language-model

How properly store and load own embeddings in Redis vector db


Here is a simple code to use Redis and embeddings but It's not clear how can I build and load own embeddings and then pull it from Redis and use in search

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis

embeddings = OpenAIEmbeddings
metadata = [
    {
        "user": "john",
        "age": 18,
        "job": "engineer",
        "credit_score": "high"
    }
]
texts = ["foo", "foo", "foo", "bar", "bar"]

rds = Redis.from_texts(
    texts,
    embeddings,
    metadata,
    redis_url="redis://localhost:6379",
    index_name="users",
)

results = rds.similarity_search("foo")
print(results[0].page_content)

But I want to load a text from e.g. text file, create embedings and load into Redis for later use. Something like this:

from openai import OpenAI
client = OpenAI()

def get_embedding(text, model="text-embedding-ada-002"):
    text = text.replace("\n", " ")
    return client.embeddings.create(input = [text], model=model).data[0].embedding

Does anyone have good example to implement this approach? Also wondering about TTL for embedings in Redis


Solution

  • Hello! You can use the TextLoader to load txt and split it into documents!

    Just like below:

    from langchain.embeddings import OpenAIEmbeddings
    from langchain.vectorstores.redis import Redis
    from langchain.document_loaders import TextLoader
    from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
    from langchain.text_splitter import CharacterTextSplitter
    
    
    embeddings = OpenAIEmbeddings()
    
    loader = TextLoader("union.txt", encoding="utf-8")
    
    documents = loader.load()
    
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    docs = text_splitter.split_documents(documents)
    
    vectorstore = Redis.from_documents(
        docs,
        embeddings,
        redis_url="redis://localhost:6379",
        index_name="users",
    )
    
    
    results = rds.similarity_search_with_score("He met the Ukrainian people.")
    print(results)