I recently start learning sql database, and I can't find the time complexity for the case where I have a huge amount of tables.
For example, I have M tables [table_1, table_2, ..., table_m, ..., table_M], and each table has N rows. When I execute "SELECT * from table_m", which time complexity is correct?
I expect O(log(M) + N) is the answer. It uses binary tree for primary key, so I think it also uses binary tree to store tables?
Edit 1: People seems being confused by my question since my inaccurate wording, so I want to explain my question more. I want to know what happen when my database has a lot of tables.
Assume I have two databases. The 1st one has L tables, and each table has N rows. The 2nd database has M tables, and it also has N rows for each table. Will it takes much more time to query a table in the 1st database, when L >> M?
Assume I have two databases. The 1st one has L tables, and each table has N rows. The 2nd database has M tables, and it also has N rows for each table. Will it takes much more time to query a table in the 1st database, when L >> M?
Tables are stored as files, one table per file. You can check with select pg_relation_filepath('tablename');
So the only change will be the overhead in finding that file. That all depends on the filesystem.
Most modern filesystems use some variation on a B-Tree (different from a binary tree) to store directory entries. ext4 uses HTrees. xfs uses B+Trees. These are all O(logm) for search. If you use something that stores directory entries inefficiently, like ext2 which uses a simple list, then it's O(m). Let's assume you're not.
If the PostgreSQL server was naive and reopened the table file every time, it's O(logm + n). However, once the server has opened a table file it can keep it open and no longer needs to traverse the directory entries to access the data in the file. A query will only be O(logm + n) if PostgreSQL didn't already have the table file open, subsequent queries will be O(n).
Unless you're doing something really pathological about your table design, m << n; the number of tables does not practically affect query performance. And if m << n is not true, you have a lot of bigger problems.