We have 10 years of archived sports data, spread across separate databases.
Trying to consolidate all the data into a single database. Since we'll be handling 10X the number of records, I'm trying to make schema redesign changes now to avoid potential performance hit.
One change entails breaking up the team roster table into 2 tables; one, a players table that stores fixed data: playerID, firstName, lastName, birthDate, etc., and another, the new roster table that stores variable data about a player: yearInSchool, jerseyNumber, position, height, weight, etc. This will allow us to, among other things, create career 4 year aggregate views of player stats.
Fair enough, makes sense, but then again, when I look at queries that tally, for example, a players aggregate scoring stats, I have to join on both player & roster tables, in addition to scoring and schedule tables, in order to get all the information needed.
Where I'm considering denormalizing is with player first and last name. If I store player first and last name in the roster table, then I can omit the player table from the equation for stat queries, which I'm assuming will be a big performance win given that total record count per table will be over 100K (i.e. most query joins will be across tables that each contain at least 100K records, and up to, for now, 300K).
So, where to draw the line with denormalization in this case? I assume duplicating first, last name is OK. Generally I enjoy non-duplication/integrity of data, but I suspect site visitors enjoy performance more!
First thought is, are you sure you've exhausted tuning options to get good SELECT performance without denormalising here?
I'm very much with you in the sense of "no sacred cows" and denormalise when necessary, but this sounds like a case where it shouldn't be too hard to get decent performance.
Of course you guys have done your own exploration, if you've ruled that out then personal opinion is it's acceptable, yeah.
One issue - what happens if a player's name changes? Can it do so in your system? Would you use a transation to update all roster details in a single COMMIT operation? For a historical records Db this could be totally irrelevant mind you.