Search code examples
databasestored-proceduresvoltdb

Do database stored procedures suffer from poor performance when returning large volumes of data?


Under VoltDB Don'ts

http://community.voltdb.com/DosAndDonts

They state

Don't create queries that return large volumes of data (such as SELECT * FROM FOO with no constraints) especially for multi-partition transactions. Be conservative in the data returned by stored procedures.

This is one of the use cases that I have. Is there some aspect of stored procedures that make them unsuitable for this type of query or is it something specific to VoltDB? Under this scenario, would performance degrade to a level that would be worse than a traditional RDBMS such as Postgres?

Edit: My query is not quite a select * from foo but I will need to select all financial transactions between certain date ranges and this could exceed 100m rows


Solution

  • All databases need to pay the materialization and i/o costs to transfer a large result set back to a user.

    However, I can speak specifically to VoltDB.

    In VoltDB, stored procedures are all transactions. Even a result set that selects a large portion of the database is a fully isolated from other concurrent procedures. The tuples in that result set need to be momentarily buffered internally (for example for cross-partition ordering or limiting) and then returned to the user.

    The combination of needing to maintain full isolation over a result that can take many milliseconds (or seconds) of I/O to return to the user and the aggregation that happens at the coordinating node of a multi-partition procedure limits the maximum result set size.

    I suspect a future release will address this limitation - many people have data-access requirements similar to what you describe.