I'm enrolled in DBM/BI certificate program (crash course more like) and I decided to embark on an independent project to sort of implement everything i'm learning in real time. Long story short, Ill be analyzing data (boxofficemojo.com) about the top grossing 130 movies from the last 13 years ( using MySQL server/workbench. ). First i'd like to map out a schema and then do some data mining/visualization. Here's how i've split it up so far:
"Movies"
Movie_ID (Primary )
Dom_Revenue
Int_Revenue
OpWe_Revenue
Budget
"Rating"
Rating_ID (P)
Rating
"Release"
Release_ID (P)
Year
Month
Day
Movie_ID (F)
"Cast"
Director_Gender (P)
Lead_Gender (P)
Director_Name
Director_Name
Movie_ID (F)
"Studio"
Studio_ID (P)
Studio_Name
and these are my relationships so far:
rating to movies - one to many ( many movies can be rated R , a movie can only have 1 rating )
release to movies - one to many ( many movies can be released on the same weekend, a movie can only be released once)
cast to movies - one to many (directors/actors can make many movies, a movie can only have one cast)
studio to movies - many to many (movies can be attached to more than one studio, a studio can make more than one movie)
I know the schema is most likely not 100% correct so should i include the primary keys from all the other tables as foreign keys in the "movies" table? and how are my relationships?
thanks in advance
This is related to the first answer by Leo but I'll be more specific and I add more observations.
First, Release
attributes are functionally dependent on Movie_ID
(or Movies in general) so it should not be a separate Entity
.
Second, and in relation to the first, you have Year
, Month
and Day
in your Release Entity why not make it as Release_Date which has Year
, Month
and Day
anyway?
Then you could make again your Release
attributes as part of your Movie
.
Third, and in relation to the first why not add a Movie_Title
field?
So, in all-in-all then you could have the following schema:
"Movies"
Movie_ID (Primary )
Movie_Title
Dom_Revenue
Int_Revenue
OpWe_Revenue
Budget
Release_Date
You could easily query movies that are release in a certain Year
like:
SELECT Movie_Title, Year(Release_Date) as Release_Year
FROM Movies
WHERE Year(Release_Date) = 2011
Or you could count it also by Year
(or by Month
)
SELECT Year(Release_Date) as Release_Year, COUNT(*) Number_of_Movies_in_a_Year
FROM Movies
GROUP BY Year(Release_Date)
ORDER BY Year(Release_Date)
Fourth, in your Cast
entity you said "Directors/Actors can make many movies, a movie can only have one cast". But looking at your Cast
you have a Movie
attribute which is a FK
(Foreign Key) from Movies
and that means by the way that a Movie
could have many Cast
because the FK
is always in the many side. And besides this entity is almost like a violation of the 4NF (Fourth Normal Form). So, the best way probably to do this is to make specialization in your Cast
table and relate it to Movies
table so that it would have One-to-Many
relationship or a Cast
or Director
could have many movies. So, it would look like this:
"Cast"
Cast_ID (PK)
Cast_Name
Cast_Gender
Cast_Type (values here could either be Director or Lead or could be simply letters like D or L)
And your Movies
table could now be changed to like this:
"Movies"
Movie_ID (Primary )
Movie_Title
Dom_Revenue
Int_Revenue
OpWe_Revenue
Budget
Release_Date
Lead_ID (FK)
Cast_ID (FK)
Lastly, you said "movies can be attached to more than one studio, a studio can make more than one movie". A Many-to-many
relationship usually has a bridge table
to create the many-to-many
relationship between entities. So, let's say you have a Studio_Movie
entity/table as your bridge table then you will have like this:
"Studio_Movie"
Studio_ID (PK, FK1)
Movie_ID (PK, FK2)