Search code examples
sqlgroup-byaggregate

How to Group By column, while keeping a naming column in as well


I am trying to show the most popular TV show in each country. However, the resulting table outputs multiple shows from the same country, if I include the column that has the shows name. If I don't include this column, it correctly outputs the MAX for eacg country, but without the show name. Can I include both?

This is the script that gets the result I want without the names.

SELECT
origin_country, MAX(popularity) as Most_popular
FROM TV_data
WHERE origin_country not like '%(%'
GROUP BY origin_country
order by Most_popular DESC

Table

This is the script that results in multiple shows from the same country, since the name column is grouped as well.

SELECT
origin_country, name, MAX(popularity) as Most_popular
FROM TV_data
WHERE origin_country not like '%(%'
GROUP BY origin_country, name
order by Most_popular DESC

Table

Thnka you, still learning SQL so any advice is greatly appreciated.


Solution

  • Your idea is correct to GROUP BY origin_country and use MAX to find the highest popularity per country.

    All you need to do now is to put this in a subquery, build a main query which shows the other columns, too and JOIN them:

    SELECT 
    tv1.origin_country,
    tv1.name,
    tv1.popularity Most_Popular
    FROM tv_data tv1
    JOIN (
    SELECT origin_country, MAX(popularity) popularity
    FROM tv_data
    GROUP BY origin_country) tv2
    ON tv1.origin_country = tv2.origin_country 
    AND tv1.popularity = tv2.popularity
    WHERE tv1.origin_country NOT LIKE '%(%'
    ORDER BY tv1.popularity DESC;
    

    The above query will be executed on every DB.

    Today, DB's usually provide window functions for that as another and maybe easier option. The exact syntax for this way depends on the DB you use since functions often differ between OracleDB, MYSQL DB etc.

    Here is an example for a SQLServer DB using RANK:

    SELECT 
    origin_country,
    name,
    popularity Most_Popular
    FROM (SELECT origin_country,
    name,
    popularity,
    RANK() OVER(PARTITION BY origin_country ORDER BY popularity DESC) dest_rank
    FROM tv_data) sub
    WHERE dest_rank = 1
    AND origin_country NOT LIKE '%(%'
    ORDER BY popularity DESC;
    

    The PARTITION BY clause works like the GROUP BY in the first query.

    If you change for example the condition dest_rank = 1 to dest_rank < 3, you will get the two most popular shows per country.

    Try out here: db<>fiddle