Search code examples
sqlpostgresqlbig-o

Time complexity to select rows from a table of M tables in Postgresql


I recently start learning sql database, and I can't find the time complexity for the case where I have a huge amount of tables.

For example, I have M tables [table_1, table_2, ..., table_m, ..., table_M], and each table has N rows. When I execute "SELECT * from table_m", which time complexity is correct?

  1. O(M + N)
  2. O(log(M) + N)
  3. O(N)

I expect O(log(M) + N) is the answer. It uses binary tree for primary key, so I think it also uses binary tree to store tables?

Edit 1: People seems being confused by my question since my inaccurate wording, so I want to explain my question more. I want to know what happen when my database has a lot of tables.

Assume I have two databases. The 1st one has L tables, and each table has N rows. The 2nd database has M tables, and it also has N rows for each table. Will it takes much more time to query a table in the 1st database, when L >> M?


Solution

  • Assume I have two databases. The 1st one has L tables, and each table has N rows. The 2nd database has M tables, and it also has N rows for each table. Will it takes much more time to query a table in the 1st database, when L >> M?

    Tables are stored as files, one table per file. You can check with select pg_relation_filepath('tablename');

    So the only change will be the overhead in finding that file. That all depends on the filesystem.

    Most modern filesystems use some variation on a B-Tree (different from a binary tree) to store directory entries. ext4 uses HTrees. xfs uses B+Trees. These are all O(logm) for search. If you use something that stores directory entries inefficiently, like ext2 which uses a simple list, then it's O(m). Let's assume you're not.

    If the PostgreSQL server was naive and reopened the table file every time, it's O(logm + n). However, once the server has opened a table file it can keep it open and no longer needs to traverse the directory entries to access the data in the file. A query will only be O(logm + n) if PostgreSQL didn't already have the table file open, subsequent queries will be O(n).

    Unless you're doing something really pathological about your table design, m << n; the number of tables does not practically affect query performance. And if m << n is not true, you have a lot of bigger problems.