Search code examples
sqlsql-serversql-server-2008-r2normalizationdatabase-normalization

1 or many sql tables for persisting "families" of properties about one object?


Our application (using a SQL Server 2008 R2 back-end) stores data about remote hardware devices reporting back to our servers over the Internet. There are a few "families" of information we have about each device, each stored by a different server application into a shared database:

  • static configuration information stored by users using our web app. e.g. Physical Location, Friendly Name, etc.
  • logged information about device behavior, e.g. last reporting time, date the device first came online, whether device is healthy, etc.
  • expensive information re-computed by scheduled jobs, e.g. average signal strength, average length of transmission, historical failure rates, etc.

These properties are all scalar values reflecting the most current data we have about a device. We have a separate way to store historical information.

The largest number of device instances we have to worry about will be around 100,000, so this is not a "big data" problem. In most cases a database will have 10,000 devices or less to worry about.

Writes to the data about an individual device happens infrequently-- typically every few hours. It's theoretically possible for a scheduled task, user-inputted configuration changes, and dynamic data to all make updates for the same device at the same time, but this seems very rare. Reads are more frequent: probably 10x per minute reads against at least one device in a database, and several times per hour for a full scan of some properties of all devices described in a database.

Deletes are relatively rare, in fact many cases we only "soft delete" devices so we can use them for historical reporting. New device inserts are more common, perhaps a few every day.

There are (at least) two obvious ways to store this data in our SQL database:

  1. The current design of our application stores each of these families of information in separate tables, each with a clustered index on a Device ID primary key. One server application writes to one table each.
  2. An alternate implementation that's been proposed is to use one large table, and create covering indexes as needed to accelerate queries for groups of properties (e.g. all static info, all reliability info, etc.) that are frequently queried together.

My question: is there a clearly superior option? If the answer is "it depends" then what are the circumstances which would make "one large table" or "multiple tables" better?

Answers should consider: performance, maintainability of DB itself, maintainability of code that reads/writes rows, and reliability in the face of unexpected behavior. Maintanability and reliability are probably a higher priority for us than performance, if we have to trade off.


Solution

  • Don't know about a clearly superior option, and I don't know about sql-server architecture. But I would go for the first option with separate tables for different families of data. Some advantages could be:

    • granting access to specific sets of data (may be desirable for future applications)

    • archiving different famalies of data at different rates

    • partial functionality of the application in the case of maintenance on a part (some tables available while another is restored)

    • indexing and partitioning/sharding can be performed on different attributes (static information could be partitioned on device id, logging information on date)

    • different families can be assigned to different cache areas (so the static data can remain in a more "static" cache, and more rapidly changing logging type data can be in another "rolling" cache area)

    • smaller rows pack more rows into a block which means fewer block pulls to scan a table for a specific attribute

    • less chance of row chaining if altering a table to add a row, easier to perform maintenance if you do

    • easier to understand the data when seprated into logical units (families)

    I wouldn't consider table joining as a disadvantage when properly indexed. But more tables will mean more moving parts and the need for greater awareness/documentation on what is going on.