Search code examples
sqlmysqlbig-o

What is the Big-O for SQL select?


What is the Big-O for SQL select, for a table with n rows and for which I want to return m result?

And What is the Big-O for an Update, or delete, or Create operation?

I am talking about mysql and sqlite in general.


Solution

  • As you don't control the algorithm selected, there is no way to know directly. However, without indexes a SELECT should be O(n) (a table scan has to inspect every record which means it will scale with the size of the table).

    With an index a SELECT is probably O(log(n)) (although it would depend on the algorithm used for indexing and the properties of the data itself if that holds true for any real table). To determine your results for any table or query you have to resort to profiling real world data to be sure.

    INSERT without indexes should be very quick (close to O(1)) while UPDATE needs to find the records first and so will be slower (slightly) than the SELECT that gets you there.

    INSERT with indexes will probably again be in the ballpark of O(log(n)^2) when the index tree needs to be rebalanced, closer to O(log(n)) otherwise. The same slowdown will occur with an UPDATE if it affects indexed rows, on top of the SELECT costs.

    Edit: O(log(n^2)) = O(2log(n)) = O(log(n)) did you mean O(log(n)^2)?

    All bets are off once you are talking about JOIN in the mix: you will have to profile and use your databases query estimation tools to get a read on it. Also note that if this query is performance critical you should reprofile from time to time as the algorithms used by your query optimizer will change as the data load changes.

    Another thing to keep in mind... big-O doesn't tell you about fixed costs for each transaction. For smaller tables these are probably higher than the actual work costs. As an example: the setup, tear down and communication costs of a cross network query for a single row will surely be more than the lookup of an indexed record in a small table.

    Because of this I found that being able to bundle a group of related queries in one batch can have vastly more impact on performance than any optimization I did to the database proper.