Search code examples
langchainlarge-language-model

Problem instantiating and using GPT4AllEmbeddings


UPDATE: After I'd posted this question I found this issue already was raised on GitHub: https://github.com/nomic-ai/gpt4all/issues/1394 I can either delete this question, or can anyone suggest a workaround? Maybe an alternative way to generate embeddings? Thanks!

I have been trying to build my first application using LangChain, Chroma and a local llm (Ollama in my case). I've been following the (very straightforward) steps from:

https://python.langchain.com/docs/integrations/llms/ollama and also tried https://python.langchain.com/docs/integrations/text_embedding/gpt4all

The problem I'm having is with the step creating embeddings using the GPT4AllEmbeddings model. I can see it is downloaded to ~/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin Although it's size is 45.5 MB which is surprisingly small. But when I try to use it, it fails with this error:

>>> gpt4all_embd = GPT4AllEmbeddings()
100%|████████████████████████████████████████████| 45.5M/45.5M [00:05<00:00, 7.66MiB/s]
Model downloaded at:  /<MY-HOME-PATH>/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin
Invalid model file
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for GPT4AllEmbeddings
__root__
  Unable to instantiate model (type=value_error)

I have tried the same steps on different machines and I'm still getting the same error. Googling didn't help.


Solution

  • Please follow below steps.

    In your activated virtual environment

     pip install -U langchain
    
     pip install gpt4all
    

    Sample code

           from langchain.embeddings import GPT4AllEmbeddings
    
            gpt4all_embd = GPT4AllEmbeddings()
            query_result = gpt4all_embd.embed_query("This is test doc")
            print(query_result)
    

    Other Option for embeddings through HuggingFace

    pip install langchain sentence_transformers
    

    Sample Code

    from langchain.embeddings import HuggingFaceEmbeddings
    
    embeddings = HuggingFaceEmbeddings()
    text = "This is a test document."
    query_result = embeddings.embed_query(text)
    print(query_result [:3])