I am working on the slides offered by UCB cs186 fall 2020 course, it says:
In terms of disk space management, there are 2 proposals:
I don't get the 2nd proposal. I do understand that leveraging filesystem is great because it does a lot for us, but:
DBMS aims to solve the problem: disk space is large but slow, memory is small but fast, how to make our DB large and fast? As a result, it needs to solve both memory management and disk management.
Typically, DBMS relies on OS filesystem for disk management but will bypass OS (i.e. mmap) for memory (aka buffer pool) management.
Disk management: Very few DBMS (BlueStore, ScyllaDB) do bypass OS filesystem and talk to the raw device directly, but due to the issues such as complexity, portability, and insignificant speedup (~10% according to Andy Pavlo), they’re not common.
Memory management: Most DBMS has a logical understanding of workload/transactions, while OS is unaware of the relationship between different buffers in memory. This makes it beneficial for DB to manage memory on its own.
Credits to Aashray#4143 and miller#0114 in CMU 15-445 unofficial community (Discord).