Search code examples
sqlprestotrino

Presto query: Find the key with maximum value in a map


I have a table

Name  pets
--------------
Andy  {dog:2, cat:1, bird:4}
John  {tiger:3, elephant:1, fish:2}
Mary  {dog:2, pig:2}

I want the find the pet type with maximum count for each person. In the event of tie, duplicate the row for each pet. The result should look like this:

Name  max_pet
------------------
Andy  bird
John  tiger
Mary  dog
Mary  pig

Currently, I exported the table and do it in python. But I am wondering can I achieve this using Presto/SQL query? Thanks!


Solution

  • There are several ways to do this. One approach is to use UNNEST to convert the map into rows, with one row per map entry. You can then use the rank() window function to assign a ranking to the pets for each name, after which you select only the top ranked item(s).

    WITH people (name, pets) AS (
      VALUES
        ('Andy', map_from_entries(array[('dog', 2), ('cat', 1), ('bird', 4)])),
        ('John', map_from_entries(array[('tiger', 3), ('elephant', 1), ('fish', 2)])),
        ('Mary', map_from_entries(array[('dog', 2), ('pig', 2)]))
    )
    SELECT name, pet AS max_pet
    FROM (
        SELECT name, pet, count,
               rank() OVER (PARTITION BY name ORDER BY count DESC) rnk
        FROM people
        CROSS JOIN UNNEST(pets) AS t (pet, count)
    )
    WHERE rnk = 1;
    
     name | max_pet 
    ------+---------
     Andy | bird    
     John | tiger   
     Mary | dog     
     Mary | pig     
    (4 rows)
    

    Using UNNEST is simple to understand, but doesn't work well if you need to combine it with other operations, or if you have duplicate names.

    Another approach is to convert the map into an array using map_entries(), use filter() to select the pet(s) with a count that equals the maximum count, then use transform() to only return the pet name. At this point, you have an array of the maximum pets. You can then UNNEST it into multiple rows, or keep it as an array for further processing. filter() and transform() utilize a lambda expression which is a Presto specific extension to SQL.

    WITH people (name, pets) AS (
      VALUES
        ('Andy', map_from_entries(array[('dog', 2), ('cat', 1), ('bird', 4)])),
        ('John', map_from_entries(array[('tiger', 3), ('elephant', 1), ('fish', 2)])),
        ('Mary', map_from_entries(array[('dog', 2), ('pig', 2)]))
    )
    SELECT
        name,
        transform(
            filter(
                map_entries(pets),
                e -> e[2] = array_max(map_values(pets))),
            e -> e[1]) AS max_pets
    FROM people;
    
     name |  max_pets  
    ------+------------
     Andy | [bird]     
     John | [tiger]    
     Mary | [dog, pig] 
    (3 rows)