I have a table
Name pets
--------------
Andy {dog:2, cat:1, bird:4}
John {tiger:3, elephant:1, fish:2}
Mary {dog:2, pig:2}
I want the find the pet type with maximum count for each person. In the event of tie, duplicate the row for each pet. The result should look like this:
Name max_pet
------------------
Andy bird
John tiger
Mary dog
Mary pig
Currently, I exported the table and do it in python. But I am wondering can I achieve this using Presto/SQL query? Thanks!
There are several ways to do this. One approach is to use UNNEST
to convert the map into rows, with one row per map entry. You can then use the rank()
window function to assign a ranking to the pets for each name, after which you select only the top ranked item(s).
WITH people (name, pets) AS (
VALUES
('Andy', map_from_entries(array[('dog', 2), ('cat', 1), ('bird', 4)])),
('John', map_from_entries(array[('tiger', 3), ('elephant', 1), ('fish', 2)])),
('Mary', map_from_entries(array[('dog', 2), ('pig', 2)]))
)
SELECT name, pet AS max_pet
FROM (
SELECT name, pet, count,
rank() OVER (PARTITION BY name ORDER BY count DESC) rnk
FROM people
CROSS JOIN UNNEST(pets) AS t (pet, count)
)
WHERE rnk = 1;
name | max_pet
------+---------
Andy | bird
John | tiger
Mary | dog
Mary | pig
(4 rows)
Using UNNEST
is simple to understand, but doesn't work well if you need to combine it with other operations, or if you have duplicate names.
Another approach is to convert the map into an array using map_entries()
, use filter()
to select the pet(s) with a count that equals the maximum count, then use transform()
to only return the pet name. At this point, you have an array of the maximum pets. You can then UNNEST
it into multiple rows, or keep it as an array for further processing. filter()
and transform()
utilize a lambda expression which is a Presto specific extension to SQL.
WITH people (name, pets) AS (
VALUES
('Andy', map_from_entries(array[('dog', 2), ('cat', 1), ('bird', 4)])),
('John', map_from_entries(array[('tiger', 3), ('elephant', 1), ('fish', 2)])),
('Mary', map_from_entries(array[('dog', 2), ('pig', 2)]))
)
SELECT
name,
transform(
filter(
map_entries(pets),
e -> e[2] = array_max(map_values(pets))),
e -> e[1]) AS max_pets
FROM people;
name | max_pets
------+------------
Andy | [bird]
John | [tiger]
Mary | [dog, pig]
(3 rows)