Search code examples
mysqlsql-serverdatabase-designnormalizationdatabase-normalization

Normalisation with 0/NULL values or can it be designed better?


I am in the process of planning my DB in MySQL for the following scenario -

I am trying to build a play by play system such as NBA where, events such as

 Play event[id, play_type_id, play_outcome_id, points] 

  Substitution event[id, player1_in_id, player2_out_id]  

  Foul event[id, foul_type_id, player_id] 

In the main play_by_play scorecard a particular event should be linked with a particular time and match

Play_by_Play[id, match_id, time_id, play_event_id, substitution_event_id, foul_event_id] 

However assuming at one particular minute only one of the three events takes place- only one event record out of three will have id, and others will be either 0 or NULL. I am left with the question is this good Normalisation for this design or is there any better way to do it ?


Solution

  • I do not think there is only one answer and it is highly dependent on how you use your data.

    One way is to have Play_by_Play table include a single reference to an event and thus avoiding the overhead of NULLs and 0s.

    Play_by_Play[id, match_id, time_id, event_id]
    

    You can define a generic event

    Event_type [event_type_id, type_name]
    Generic_Event[event_id, event_type_id]
    

    and all the other event types can be 1:1 extensions for this generic event type:

    Play event[id, event_id, play_type_id, play_outcome_id, points] 
    Substitution event[id, event_id, player1_in_id, player2_out_id]  
    Foul event[id, event_id, foul_type_id, player_id] 
    

    This design favors normalization and allows to quickly select information about the events regardless of their type (your design requires multiple joins for this).

    However, aggregating information will require more joins (to the tables actually holding relevant data for events) and also Generic_Event might grow very much with potential effects upon performance.

    Your way favors faster aggregation of information, as you can query only Play_by_Play table. E.g.: get number of substitutions and number of fouls within a time frame.

    It might also more space and query execution efficient, as you use less JOINs and do not have a big event table.

    It is unclear what is your actual database engine (you tagged both MySql and SQL Server), but for SQL Server there is a feature that helps you optimize space for tables with lots of NULL values: sparse columns.