Search code examples
pythonlangchainpy-langchain

Does the SQLDatabase agent for langchain store the database in memory?


Say I have a huge database stored in BigQuery, and I'd like to use the SQL Database Agent to query this database using natural language prompts. I can't store the data in memory because it's too huge. Does the SQLDatabase function store this in memory?

If so, can I directly query the source data without loading everything? I'm comfortable with the latencies involved with read operations on disk. This question might sound novice but I'm gradually exploring this package as I go


Solution

  • Quoting from Dariel Dato-on's answer to Does anyone know what is roughly the database size limit for SQL agent? #3730:

    The SQLDatabaseChain will still work if you have a really large database. SQLDatabaseChain doesn't load the entire database into memory. Instead, it formulates a SQL query based on your input and then runs it against the database. The only time you may run into issues is if the SQL query result is too long and exceeds the prompt token limit for your chosen LLM. For example, davinci has a limit of about 4,000 tokens.

    If you're using the SQL Database Agent (rather than the chain), be aware that it will start with listing all the tables in the db. This can also potentially cause problems if the number of tables is too large and exceeds the prompt token limit.

    But quoting from follow-up discussion, it does "load up the DDL of the database into the context of the current session. [...] so [...] if the database has a large DDL [...] that [can] also create an issue".