Search code examples
databaseindexingsoftware-design

Database Software Design Review for a Key Value Single Server DB


Context on the problem statement. Scroll to the bottom for questions.

Note: The tables are not relational, joins can be done at application level.

Classes

Record

Most atomic unit of the database (each record has key, value, id)

Page

Each file can store multiple records. Each page is a limited chunk (8 kb??), and it also stores an offset to retrieve each id at the top?

Index

A B-tree data structure, that stores ability to do log(n) lookups to find which id lives in which page. We can also insert id's and page into this B-tree.

Table

Each Table is an abstraction over a directory that stores multiple pages. Table also stores Index.

Database

Database is an abstraction over a directory which includes all tables that are a part of that database.

Database Manager

Gives ability to switch between different databases, create new databases, and drop existing databases.

Communication In Main Process

Initiates the Database Manager as it's own process. When the process quits it saves Indexes back to disk. The process also stores the indexes back to disk based on an interval. To interact with this DB process we will use http to communicate with it.

Database Manager stores a reference to the current database being used. The current database attribute stored in the Database Manager stores a reference to all Table's in a hashmap. Each Table stores a reference to the index that is read from the index page from disk and kept in memory. Each Table exposes public methods to set and get key value pair. Get method navigates through b-tree to find the right page, on that page it finds the key val pair based on the offset stored on the first line, and returns a Record.

Each Set method adds a key val pair to the database and then updates the index for that table.

Outstanding Questions:

  • Am I making any logical errors in my design above?
  • How should I go about figuring what the data page size should be (Not sure why relation DB's do 8gb)?
  • How should I store the Index B-tree to disk?
  • Should the Database load all indexes for the table in memory at the very start ?

Solution

  • A couple of notes from the top of my head:

    How many records do you anticipate storing? What are the maximum key and value sizes? I ask, because with a file per page scheme, you might find yourself exhausting available file handles.

    Are the database/table distinctions necessary? What does this separation gain you? Truly asking the question, not being socratic.

    I would define page size in terms of multiples of your maximum key and value sizes so that you can get good read/write alignment and not have too much fragmentation. It might be worth having a naive, but space inefficient, implementation that aligns all writes.

    I would recommend starting with the simplest possible design (load all indices, align writes, flat schema) to launch your database, then layer on complexity and optimizations as they become needed, but not a moment before. Hope this helps!