Search code examples
sqlgoogle-bigquerystring-agg

SQL BQ Return user's best and worst rated movie


I have data in format:

user_id user_name movie_rating movie_name
1 x 1 asd1
1 x 3 asd2
1 x 5 asd3
1 x 5 asd4
2 y 2 asd4
2 y 3 asd5
2 y 4 asd6

I want to extract the best and the worst rated movie per user in a format where best_rated_movie and worst_rated_movie is array (because user can have multiple worst movies etc):

user_id user_name best_rated_movie worst_rated_movie
1 x (asd3,asd4) asd1
2 y asd6 asd4

I managed to extract table to output where I have only the best and the worst rated movies:

user user_name movie_rating movie_name
1 x 1 asd1
1 x 5 asd3
1 x 5 asd5
2 y 2 asd4
2 y 4 asd6

Here is a code which I used for that:

with best_movie_rating as (
  select
    user_id,
    max(movie_rating) as max_rating
  from source_table
  group by user_id
),

worst_movie_rating as (
  select
    user_id,
    min(movie_rating) as min_rating
  from source_table
  group by user_id
),
columns_final as (
  select
    t1.user_id,
    t1.user_name,
    t1.company_name,
    t1.movie_rating,
    t1.movie_name 
  from source_table t1
  inner join best_movie_rating t2
    on t1.user_id = t2.user_id 
  inner join worst_movie_rating t3
    on t1.user_id = t3.user_id 
  where (t1.movie_rating= t2.max_rating and t1.user_id = t2.user_id)
    or (t1.movie_rating= t3.min_rating and t1.user_id = t3.user_id)
)
select * from columns_final   

Unfortunately I don't know how to move forward, I've tried aggregating with no success (especially with arrays in play). I'd appreciate at least a suggestion how to deal with this.


Solution

  • I want to extract the best and the worst rated movie per user in a format where best_rated_movie and worst_rated_movie is array (because user can have multiple worst movies etc)

    Consider below

    select user_id, user_name,
      array_concat_agg(movies order by movie_rating desc limit 1) best_rated_movie,
      array_concat_agg(movies order by movie_rating limit 1) worst_rated_movie,
    from (
      select user_id, user_name, movie_rating, array_agg(movie_name)  movies, 
      from `project.dataset.table` t
      group by user_id, user_name, movie_rating
    )
    group by user_id, user_name    
    

    If applied to sample data in your question - output is

    enter image description here

    If (as other answer imply) you want result as a string of comma separated movies - you can use below

    select user_id, user_name,
      string_agg(movies order by movie_rating desc limit 1) best_rated_movie,
      string_agg(movies order by movie_rating limit 1) worst_rated_movie,
    from (
      select user_id, user_name, movie_rating, string_agg(movie_name)  movies, 
      from `project.dataset.table` t
      group by user_id, user_name, movie_rating
    )
    group by user_id, user_name
    

    with output

    enter image description here