Search code examples
sqlms-accessimdb

SQL: How do I find which movie genre a user watched the most? (IMDb personal project)


I'm currently working on a personal project and I could use a little help. Here's the scenario:

I'm creating a database (MS Access) for all of the movies myself and some friends have ever watched. We rated all of our movies on IMDb and used the export feature to get all of the movie data and our movie ratings. I plan on doing some summary analysis on Excel. One thing I am interested in is the most common movie genre that each person watched. Below is my current scenario. Note that the column "const" is the movies' unique IDs. I also have individual tables for each person's ratings and the following tables are the summary tables that make up the combination of all the movies we have watched.

Here's the table I had: https://i.sstatic.net/jnxOy.jpg

I assigned each genre an ID, like this: https://i.sstatic.net/zgPku.jpg

And here is a table where I have separate instances for each movie ID and a unique genre: https://i.sstatic.net/0H8QR.jpg

I want to find a way to count up all of the genres that each person watches. Any advice? I would love to provide any additional information that you need!

Thank you!


Solution

  • You need to have at least one table which has one row per user and const (movie watched). In the 3 example tables you posted nothing shows who watched which movies, which is information you need to solve your problem. You mention having "individual tables for each person's ratings," so I assume you have that information. You will want to combine all of them though, into a table called PERSON_MOVIE or something of the like.

    So let's say your second table is called GENRE and its columns are ID, Genre.

    Let's say your third table is called GENRE_MOVIE and its columns are Const and ID (ID corresponds to ID on the GENRE table)

    Let's say the fourth table, which you did not post, but which is required, is called PERSON_MOVIE and its columns are person, Const, rating.

    You could then write a query like this:

    select vw1.*, ge.genre
      from (select um.person, gm.id as genre_id, count(*) as num_of_genre
              from user_movie um
             inner join genre_movie gm
                on um.const = gm.const
             group by um.person, gm.id) vw1
     inner join (select person, max(num_of_genre) as high_count
                   from (select um.person, gm.id, count(*) as num_of_genre
                           from user_movie um
                          inner join genre_movie gm
                             on um.const = gm.const
                          group by um.person, gm.id) x
                  group by person) vw2
        on vw1.person = vw2.person
       and vw1.num_of_genre = vw2.high_count
     inner join genre ge
        on vw1.genre_id = ge.id
    

    Edit re: your comment:

    So right now you have multiple tables reflecting people's ratings of movies. You need to combine those into a table called PERSON_MOVIE or something similar (as in example above).

    There will be 3 columns on the table: person, const, rating

    I'm not sure if access supports the traditional create table as select query but ordinarily you would be able to construct such a table in the following way:

    create table person_movie as
    select 'Bob', const, [You rated]
    from ratings_by_bob
    union all
    select 'Sally', const, [You rated]
    from ratings_by_sally
    union all
    select 'Jack', const, [You rated]
    from ratings_by_jack
    ....
    

    If not, just combine the tables manually and add a third column as shown indicating what users are reflected by each row. Then you can run my initial query.