Search code examples
databaseperformancedatabase-designtemporal

The best way to store the following temporal data in DB


Imagine we have a set of entities each of which has its state: free, busy or broken. The state is specified for a day, for example, today on 2011-05-17 an entity E1 is free and tomorrow on 2011-05-18 it is busy.

There is a need to store ~10^5 entities for 1000 days. Which is the best way to do so?

I am thinking about 2 options:

  • represent each day as a character "0", "1" or "2" and store for every entity a string of 1000 characters
  • store each day with entity's state in a row, i.e. 1000 rows for an entity

The most important query for such data is: given start date and end date identify which entities are free.

Performance is of higher priority than storage.

All suggestions and comments are welcome.


Solution

  • Create a single table to hold your data. Create the table with an ID, Date, Entity name and eight boolean fields. SQL Server 2008 gave me the code below for the table:

    CREATE TABLE [dbo].[EntityAvailability](
    [EA_Id] [int] IDENTITY(1,1) NOT NULL,
    [EA_Date] [date] NOT NULL,
    [EA_Entity] [nchar](10) NOT NULL,
    [EA_IsAvailable] [bit] NOT NULL,
    [EA_IsUnAvailable] [bit] NOT NULL,
    [EA_IsBroken] [bit] NOT NULL,
    [EA_IsLost] [bit] NOT NULL,
    [EA_IsSpare1] [bit] NOT NULL,
    [EA_IsSpare2] [bit] NOT NULL,
    [EA_IsSpare3] [bit] NOT NULL,
    [EA_IsActive] [bit] NOT NULL,
     CONSTRAINT [IX_EntityAvailability_Id] UNIQUE NONCLUSTERED 
    (
        [EA_Id] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    END
    GO
    
    IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[EntityAvailability]') AND name = N'IXC_EntityAvailability_Date')
    CREATE CLUSTERED INDEX [IXC_EntityAvailability_Date] ON [dbo].[EntityAvailability] 
    (
        [EA_Date] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    GO
    

    The clustered index on date will perform best for your range searches. Never allow searches without a date range and there will be no need for any index other than the clustered index. The boolean fields allows eight situations using only a single byte. The row size for this table is 35 bytes. 230 rows will fit on a page. You stated you had need to store 10^5 entities for 1000 days which is 100 million. One hundred million rows will occupy 434,782 8K pages or around 3 gig.

    Install the table on an SSD and you are set to go.