So I am trying to wrap my head around the whole "normalization" thing. To understand it better, I have come up with a this case of storing songs
Suppose I have the following db:
Album Table:
album_name| genre
album_1| genre_1, genre_2
album_2| genre_1
album_3| genre_2
To normalize, I thought of the following approach
Album Table:
album_name| genre_id
album_1| 3
album_2| 1
album_3| 2
Genre Table:
genre_id| genre_1| genre_2
0| false| false
1| true| false
2| false| true
3| true| true
Thus, if a new genre pops up, all I need to do is create a new column in genre table and the new corresponding genre_id
can be assigned. Well, that will require filling up of all possible combinations, but that will only happen once for every new genre introduced.
Also, what I thought of, will that be considered "normalizing"? From the examples I have seen around, I haven't seen creation of tables with columns that were originally data.
The canonical way of doing this would be to use three tables:
Album |
album_id | album_name (and maybe other columns)
1 | Rumours
2 | Thriller
3 | To the Moon and Back
Genre
genre_id | genre_name (also maybe other columns)
1 | rock
2 | pop
3 | alternative
AlbumGenre
album_id | genre_id
1 | 1
1 | 2
2 | 2
3 | 2
3 | 3
Normalization is all about avoiding the storage of repetitive data. If you scrutinize this design, you will see that information about albums and genres is stored only once, in each respective table. Then, the AlbumGenre
table stores the relationships between albums and the various genres. This table is usually called a "bridge" table, because it links albums to their genres.
The problem with your proposed Genre
table is that it repeats information about relationships even if those relationships don't exist. Furthermore, this approach won't scale well at all if you need to add more genres to the database.