Search code examples
databasedatabase-designschemaarchive

Database Schema Design - Tips for improving ability to archive?


I am designing a table in the database which will store log entries from the application. There are a few things which is making me think about this design more than usual.

  • However these log entries will be used at runtime by the system to make decisions so they need to be relatively fast to access.
  • They also have the problem is that there is going to be lots of them (12.5 million added per month is my estimate).
  • I don't need more than the last 30 to 45 days at most for the decision processing.
  • I need to keep all of them for much longer than 45 days for support & legal issues, likely atleast 2 years.
  • The table design is fairly simple, all simple types (no blobs or anything), where possible will use the database engine to put in the default data, at most one foreign key.
  • If it makes any difference the database will be Microsoft SQL Server 2005.

What I was thinking is having them written to a live table/database and then using an ETL solution move "old" entries to an archive table/database - which is big and on slower hardware.

My question is do you know of any tips, tricks or suggestions for the database/table design to make sure this works as well as possible? Also if you think it's a bad idea please let me know, and what you think a better idea would be.


Solution

  • Some databases offer "partitions" (Oracle, for example). A partition is like a view which collects several tables with an identical definition into one. You can define criteria which sort new data into the different tables (for example, the month or week-of-year % 6).

    From a user point of view, this is just one table. From the database PoV, it's several independent tables, so you can run full table commands (like truncate, drop, delete from table (without a condition), load/dump, etc.) against them in an efficient manner.

    If you can't have a partition, you get a similar effect with views. In this case, you can collect several tables in a single view and redefine this view, say, once a month to "free" one table with old data from the rest. Now, you can efficiently archive this table, clear it and attach it again to the view when the big work has been done. This should help greatly to improve performance.

    [EDIT] SQL server 2005 onwards (Enterprise Edition) supports partitions. Thanks to Mitch Wheat