Search code examples
pythonlangchainagent

how do i retrieve a DataFrame as a local variable after LangChain tool execution


I’m using LangChain with a ReAct agent to load an Excel file, transform it, and store the result in a DataFrame named df. However, the final agent_executor.invoke() call only returns a dictionary describing what happened rather than the actual DataFrame.

I would like to have the resulting DataFrame accessible as a local variable in my main script. Here is my code:

import pandas as pd
from langchain.agents import AgentExecutor, create_react_agent, Tool
from langchain.tools import tool
from langchain_community.tools import ShellTool
from langchain_openai import ChatOpenAI
from langchain_experimental.utilities import PythonREPL
from langchain import hub

@tool
def load_excel_response(filepath: str):
    """Load an Excel file and return a transformed DataFrame."""
    df = pd.read_excel(filepath, skiprows=2, names=['responses']).reset_index(drop=True)
    df = df.select_dtypes(include=['object'])
    return df # df.to_json()

llm = ChatOpenAI(model='gpt-4o-mini')
python_repl = PythonREPL()
shell_tool = ShellTool()

repl_tool = Tool(
    name="python_repl",
    description="Run Python commands in a REPL environment.",
    func=python_repl.run,
)

tools_list = [load_excel_response, repl_tool, shell_tool]
prompt = hub.pull("hwchase17/react")

shell_agent = create_react_agent(llm=llm, tools=tools_list, prompt=prompt)
agent_executor = AgentExecutor(agent=shell_agent, tools=tools_list, verbose=True)

path = 'excel_file.xlsx'
user_prompt = f"Load the Excel file located at {path} and assign the result to a variable named df."

response = agent_executor.invoke({"input": user_prompt})
print("Agent response:", response)

What i see happening:

  • The agent loads the Excel file (confirmed by the verbose=True result)
  • The final response from agent_executor.invoke() is a dictionary summarizing the agent’s steps and final message, not the actual DataFrame data.
  • I want to return df after the agent execution.

What I have tried:

  • Creating a global df variable and pass the agent executore df to the global
  • return to_json format in the load_excel_response.

No matter what, the agent’s final response is still just a summary.

Question:

How can I actually retrieve the DataFrame as a local variable in my main Python script after the agent finishes? I need the raw data, not just the textual summary. What’s the correct approach with LangChain so that I can have a fully formed DataFrame (or at least something I can reconstitute into one) in my script?


Solution

  • I think what you need to do is to ensure you're getting a structured Json output. Then once you get that, you can flatten it can convert it to python with some python codes.

    Expecting agents to give you direct dataframe as output might not work well.