Search code examples
sql-serverdatabase-optimizationdatabase-design

Advice needed: SQL Server DB Architecture for Large Database


HI all!

My client currently has a SQL Server database that performs 3-4 Million Inserts, about as many updates and even more reads a day, every day. Current DB is laid out weirdly IMHO: The incoming data goes to "Current" table, then nightly records are moved to corresponding monthly tables (i.e. MarchData, AprilData, MayData etc.), that are exact copies of Current table (schema-wise i mean). Reads are done from view that UNIONs all monthly tables and Current table, Inserts and Updates are done only to Current table. It was explained to me that the separation of data into 13 tables was motivated by the fact that all those tables use separate data files and those data files are written to 13 physical hard drives. So each table gets its own hard drive, supposedly speeding up the view performance. What i'm noticing is that nightly record move to monthly tables (which is done every 2 minutes for the period of night, 8 hours) coincides with full backup and DB starts crawling, web site times out etc.

I was wondering is this approach really the best approach out there? Or can we consider a different approach? Please mind, that the database is about 300-400 GB and growing by 1.5-2 GB a day. Every so often we move records that are more than 12 months old to a separate database (archive).

Any insight is highly appreciated.


Solution

  • It was explained to me that the separation of data into 13 tables was motivated by the fact that all those tables use separate data files and those data files are written to 13 physical hard drives. So each table gets its own hard drive,

    THere is one statement for that: IDIOTS AT WORK.

    • Tables are not stored on discs, but in file spaces which can span multiple data files. Note this... so you can have one file space that has 12 data files on 13discs and a table would be DISTRIBUTED OVER ALL 13 TABLES. No need to play stupid silly games to distribute the load, it is already possible just by reading the documentation.

    • Even then, I seriously doubt 13 discs are fast. Really. I run a smaller database privately (merely 800gb) that has 6 discs for the data alone, and my current work assignment is into three digits of discs (that is 100+). Please, do not name 13 discs a large database.

    • Anyhow, SHOULD the need arive to distribute data, not a UNION but partitioned tables (atgain a standard sql server, albeit enterprise edition feature) is the way to go.

    Please mind, that the database is about 300-400 GB and growing by 1.5-2 GB a day.

    Get a decent server.

    I was wondering is this approach really the best approach out there?

    • Oh, hardware. Get one of the SuperMicro boxes for databases 2 to 4 rack units high, SAS backplane, 24 to 72 slots for discs. Yes, one one computer.

    • Scrap that monthly blabla table crap that someone came up with who obviously shoul not work with databases. All in one table. Use filespaces and multiple data files to handle load distribution for all tables into the various discs. Unless...

    • ...you actually realize that running discs like that is gross neglect. A RAID 5 or RAID 6 or RAID 10 is in order, otherwise your server is possibly down when a disc fails which will happen and resotring a 600gb database takes time. I run RAID 10 for my data discs, but then privately have tables with about a billion rows (and in work we add about that a day). Given the SMALL size of the database, a couple of SSD would also help.... their IOPS budget would mean you could go to possibly 2-3 discs and get a lot more speed out. If that is not possible, my bet is that those discs are slow 3.5" discs with 7200 RPM... an upgade to enterprise level discs would help. I personaly use 300gb Velociraptors for databases, but there are 15k SAS discs to be taken ;)

    Anyho, this sounds really badly set up. So bad I would either be happy my trainee came up with something that smart (as it woul definitely be over the head of a trainee), or my developer would stop working for me the moment I Find that out (based on gross incompetence, feel free to challenge in court)

    Reorganize it. Also be carefull with any batch processing - those NEED to be time staggered so they do not overlap wioth backups. There is only so much IO a mere simple low speed disc can deliver.