Search code examples
openai-apilangchainfaiss

Is there a way to Use langchain FAISS without an AI?


I'm working on an AI project but my current problem right now is that FAISS is taking far too long to load the documents. So Iv moved it into its own service via fastapi.

Everything Looks ok, but when I run it I get the error of:

id not find openai_api_key, please add an environment variable `OPENAI_API_KEY`

In my code:

embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(documents, embeddings)

Now I am Using OpenAI but not in this service so i did not add my key.

From my understanding its just taking text tokenizing it using openAI's token map, and then doing a search and finding the nearest related documents based on that query.

That, Technically does not actually reach out to Open AI servers does it?

Afterwords i'm just adding the related documents to the prompt that I Send to Open AI's servers, So if its sending data to open AI twice that a tad inefficient right?

How can I get this to just be its own service? Or am I wasting my time here?


Solution

  • Now I am Using OpenAI but not in this service so i did not add my key.

    Calling FAISS.from_documents(documents, embeddings) embeds the documents. Embedding documents using the OpenAIEmbeddings requires an API call to OpenAI for each document.

    Per the documentation:

    To use, you should have the openai python package installed, and the environment variable OPENAI_API_KEY set with your API key or pass it as a named parameter to the constructor.

    https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.openai.OpenAIEmbeddings.html

    Afterwords i'm just adding the related documents to the prompt that I Send to Open AI's servers, So if its sending data to open AI twice that a tad inefficient right?

    Maybe, but

    1. It's a trivial amount of text data, and
    2. OpenAI doesn't have a vector search product, so any approach that uses both OpenAI embeddings and OpenAI LLMs will require two requests.

    Is there a way to Use langchain FAISS without an AI?

    There are a few approaches you could take:

    1. Run a local model. This is not "without AI," but I'm guessing you really mean "without OpenAI." There are various language models that can be used to embed a sentence/paragraph into a vector. Here's an example.
    2. Bite the bullet, and use OpenAI or some other API for getting the embeddings. Langchain has a list of supported embeddings here.
    3. Use something like SciKit Learn's TfIdfVectorizer. This is not AI - it's an approach where each keyword in your input is mapped to one element of the output. This is no longer semantic search, but keyword search. For example, "street" and "road" would vectorize to totally different things. That might be good enough for your application, though.