Search code examples
databaserdbmsdatabase-normalization

DBMS - Music genre normalization in a database


So I am trying to wrap my head around the whole "normalization" thing. To understand it better, I have come up with a this case of storing songs

Suppose I have the following db:

Album Table:

album_name| genre           
   album_1| genre_1, genre_2
   album_2| genre_1
   album_3| genre_2

To normalize, I thought of the following approach

Album Table:

album_name| genre_id          
   album_1|        3
   album_2|        1
   album_3|        2

Genre Table:

genre_id| genre_1| genre_2
       0|   false|   false
       1|    true|   false
       2|   false|    true
       3|    true|    true 

Thus, if a new genre pops up, all I need to do is create a new column in genre table and the new corresponding genre_id can be assigned. Well, that will require filling up of all possible combinations, but that will only happen once for every new genre introduced.

Also, what I thought of, will that be considered "normalizing"? From the examples I have seen around, I haven't seen creation of tables with columns that were originally data.


Solution

  • The canonical way of doing this would be to use three tables:

    Album    | 
    album_id | album_name  (and maybe other columns)
    1        | Rumours
    2        | Thriller
    3        | To the Moon and Back
    
    Genre
    genre_id | genre_name  (also maybe other columns)
    1        | rock
    2        | pop
    3        | alternative
    
    AlbumGenre
    album_id | genre_id
    1        | 1
    1        | 2
    2        | 2
    3        | 2
    3        | 3
    

    Normalization is all about avoiding the storage of repetitive data. If you scrutinize this design, you will see that information about albums and genres is stored only once, in each respective table. Then, the AlbumGenre table stores the relationships between albums and the various genres. This table is usually called a "bridge" table, because it links albums to their genres.

    The problem with your proposed Genre table is that it repeats information about relationships even if those relationships don't exist. Furthermore, this approach won't scale well at all if you need to add more genres to the database.