Search code examples
pipelinetranslationhuggingface-transformerslangchainlarge-language-model

Langchain/Huggingface Pipeline Error about model_kwargs which I did not include


I am currently trying to use the Helsinki-NLP/opus-mt-en-de and de-en models. I was trying to setup a pipeline and use both as LLMChain but I keep getting the same error:

ValueError: The following `model_kwargs` are not used by the model: ['pipeline_kwargs', 'return_full_text'] (note: typos in the generate arguments will also show up in this list)

I used the following snippet to initialise both models and ran the snippet after to test the output:

def get_translation_chains():
    _de_en_translation_prompt = PromptTemplate.from_template(
        """Translate the following text from German to English:
        {text}
        """
    )

    _en_de_translation_prompt = PromptTemplate.from_template(
        """Translate the following text from English to German:
        {text}
        """
    )

    _en_to_de_tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")
    _en_to_de_model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-de")
    _de_to_en_tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-de-en")
    _de_to_en_model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-de-en")

    _en_to_de_pipeline = pipeline(
        model=_en_to_de_model,
        tokenizer=_en_to_de_tokenizer,
        task="translation",
    )

    _de_to_en_pipeline = pipeline(
        model=_de_to_en_model,
        tokenizer=_de_to_en_tokenizer,
        task="translation",
    )

    _de_to_en_llm = HuggingFacePipeline(pipeline=_de_to_en_pipeline)
    _en_to_de_llm = HuggingFacePipeline(pipeline=_en_to_de_pipeline)

    _de_to_en_chain = LLMChain(
        prompt=_de_en_translation_prompt,
        llm=_de_to_en_llm,
    )

    _en_to_de_chain = LLMChain(
        prompt=_en_de_translation_prompt,
        llm=_en_to_de_llm,
    )

    return _en_to_de_chain, _de_to_en_chain


en_to_de_chain, de_to_en_pipeline = get_translation_chains()

print(en_to_de_chain.invoke({"text": "Hello, how are you?"}))

I am fairly new to using LLMs and both the huggingface and langchain libraries and could not find anything to give me a clue on this one.

I tried to use the pipeline with only setting the task I wanted "translation_de_to_en" and the other way around as well as using "translation" only for both default and more detailed pipeline. I also tried to set the kwargs option to None and False but with no success


Solution

  • Your code doesn't have any errors.

    The reason for the error is that, as of as of version 0.0.28 of langchain-community, only the tasks

    • text2text-generation
    • text-generation
    • summarization

    are supported with HuggingFacePipeline.

    Your task is translation, which is not supported, as of yet.

    As to why the error occurs, Langchain passes the argument return_full_text (see this, line 264) to the underlying HuggingFace model. However, MarianMTModel (the model you're using) doesn't take this as a parameter.

    You're better off using the base HuggingFace model directly. This is the easiest solution.

    translation = _en_to_de_pipeline("Hello, how are you?")
    print(translation)
    

    Output

    [{'translation_text': "Hallo, wie geht's?"}]
    

    It returns without error.