Search code examples
androidsqliteandroid-roomdatabase-performance

One big database versus many small databases


My App deals with several similar datasets. That is, they are stored in the same tables, but different data. The user may create more datasets. In any case, these datasets are guaranteed to be disjunct. There will never be any data in one dataset linked somehow to data in another dataset.

I was wondering, would it be better to have a dedicated database for each dataset instead of having all the data in one big database?

I would expect lookup times to improve, if the user works on a smaller database. Is there a rule of thumb, how many entries a database (or table) can hold before I should worry about lookup times?

One drawback I can think of is that opening a database creates some overhead. However, I don't expect the user to switch datasets frequently.

Consider this example:

The database contains tables for companies, clients, products and orders. Companies never share clients or products, thus companies are the disjunct datasets. However, all products, clients and orders are in just one big table (for each, respectively).

Queries to the database might include:

  • All orders for a particular client.
  • All products a particular client has ordered.
  • All clients who have ordered a particular product.
  • etc.

These queries have in common, that they will always be issued in the context of one single company. Yet since the database doesn't know about this logical partition, all clients, products and orders will be searched.

If I were to have several databases, for each company one, my logical partition would be reflected and only the relevant data would be searched. I'm not sure of the overhead of having that many databases though.

Since I'm new to database schema design, I want to throw this idea out there to see, if several databases really are a good idea or not.

Update:

In case this wasn't clear: the database will be on the Android Phone, not in the Cloud or something.


Solution

  • There's no rule of thumb. AFAIK the look-up time doesn't purely depend on number of entries. It depends on several factors such as but not limited to -

    1. how fat the table is
    2. table indexes
    3. how the data is stored e.g. boolean true/false or string YES/NO in the table having 3 million records
    4. hardware size
    5. primary key/foreign key relationship (sort of connected to point 1 above)

    As a general approach, one database theory is advisable. The servers nowadays are quite powerful and there are multiple options when it comes to handling the performance optimisation such as -

    1. cloud databases which give the flexibility to choose the size
    2. BigData
    3. In-memory databases
    4. Analysis services such as SSAS
    5. NoSQL databases which are horizontally scalable e.g. FireStore

    Now, the biggest benefit of using one database is - your development and testing will be quick. What does that mean ? Let's say you need to add/delete/modify one field in one table. Now, if you have 10 different databases then you will need to do the exact same change at 10 different places and then test it as well. If the changes are frequent then you might end-up in writing a generic script. And there is always a chance that this script might break e.g. database change, patch update blah blah. However, in the case of one database, the efforts are straight away 1/10th. Another benefit is database administration/monitoring will be easy e.g. adding indexes.

    I had a similar requirement few months back wherein I've a similar application (mobile+web). The set-up is similar. Different companies access the data. And the user from a particular company is allowed to view data pertaining to his/her company. All I've done is to add one more column assigned as ORGCODE in almost every table. More than 12 clients are happily sharing the tables without any issues.

    Disclaimer: All of the above is quite generic without knowing your use-case and performance requirement.