I am trying to show the most popular TV show in each country. However, the resulting table outputs multiple shows from the same country, if I include the column that has the shows name. If I don't include this column, it correctly outputs the MAX for eacg country, but without the show name. Can I include both?
This is the script that gets the result I want without the names.
SELECT
origin_country, MAX(popularity) as Most_popular
FROM TV_data
WHERE origin_country not like '%(%'
GROUP BY origin_country
order by Most_popular DESC
This is the script that results in multiple shows from the same country, since the name column is grouped as well.
SELECT
origin_country, name, MAX(popularity) as Most_popular
FROM TV_data
WHERE origin_country not like '%(%'
GROUP BY origin_country, name
order by Most_popular DESC
Thnka you, still learning SQL so any advice is greatly appreciated.
Your idea is correct to GROUP BY
origin_country and use MAX
to find the highest popularity per country.
All you need to do now is to put this in a subquery, build a main query which shows the other columns, too and JOIN
them:
SELECT
tv1.origin_country,
tv1.name,
tv1.popularity Most_Popular
FROM tv_data tv1
JOIN (
SELECT origin_country, MAX(popularity) popularity
FROM tv_data
GROUP BY origin_country) tv2
ON tv1.origin_country = tv2.origin_country
AND tv1.popularity = tv2.popularity
WHERE tv1.origin_country NOT LIKE '%(%'
ORDER BY tv1.popularity DESC;
The above query will be executed on every DB.
Today, DB's usually provide window functions for that as another and maybe easier option. The exact syntax for this way depends on the DB you use since functions often differ between OracleDB, MYSQL DB etc.
Here is an example for a SQLServer DB using RANK
:
SELECT
origin_country,
name,
popularity Most_Popular
FROM (SELECT origin_country,
name,
popularity,
RANK() OVER(PARTITION BY origin_country ORDER BY popularity DESC) dest_rank
FROM tv_data) sub
WHERE dest_rank = 1
AND origin_country NOT LIKE '%(%'
ORDER BY popularity DESC;
The PARTITION BY
clause works like the GROUP BY
in the first query.
If you change for example the condition dest_rank = 1
to dest_rank < 3
, you will get the two most popular shows per country.
Try out here: db<>fiddle