I do not understand why the below use of the PydanticOutputParser
is erroring.
The docs do not seem correct - If I follow this exactly (i.e. use with_structured_output
exclusively, without an output parser) then the output is a dict, not Pydantic class. So I thought I modified it consistently with so SO answers e.g. this
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser
from uuid import uuid4
from pydantic import BaseModel, Field
class TestSummary(BaseModel):
"""Represents a summary of the concept"""
id: str = Field(default_factory=lambda: str(uuid4()), description="Unique identifier")
summary: str = Field(description="Succinct summary")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0).with_structured_output(TestSummary)
parser = PydanticOutputParser(pydantic_object=TestSummary)
prompt = PromptTemplate(
template="You are an AI summarizing long texts. TEXT: {stmt}",
input_variables=["stmt"]
)
runnable = prompt | llm | parser
result = runnable.invoke({"stmt": "This is a really long piece of literature I'm too lazy to read"})
The error is
ValidationError: 1 validation error for Generation
text
str type expected (type=type_error.str)
As discussed, if I omit the output parser, I get a dict:
runnable = prompt | llm #| parser
result = runnable.invoke({"stmt": "This is a really long piece of literature I'm too lazy to read"})
type(result)
dict
Output parsers in Langchain receive a string, not structured data. They are used to do what you are already doing with with_structured_output
, parse some input string into structured data, or possibly change its format.
From the documentation:
Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:
- "Get format instructions": A method which returns a string containing instructions for how the output of a language model should be formatted.
- "Parse": A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.
Now you have the structured data, you just need to fill the model with it. https://stackoverflow.com/a/64505888/3443596
runnable = prompt | llm
result_dict = runnable.invoke({"stmt": "This is a really long piece of literature I'm too lazy to read"})
result = TestSummary.parse_obj(result_dict)