sql postgresql aggregate-functions greenplum

Find modal values in each row, with custom tie breaking in PostreSQL

I have table like

create table  vals (k, v, z, y, m) AS VALUES 
(0, 0, 1, 8, 0), 
(1, 2, 1, 1, 6), 
(1, 0, 0, 2, 2),
(5, 4, 6, 8, 9), 
(0, 0, 0, 6, 6);

I want to calculate modal value for each row. If there are multiple modal values, return such value, that occurs first.

k	v	z	y	m	modal
0	0	1	8	0	0
1	2	1	1	6	1
1	0	0	2	2	0
5	4	6	8	9	5
0	0	0	6	6	0

In the 3rd row values 0 and 2 are equally frequent but 0 occurs first so it wins.
In the 4th row all values are equally frequent but 5 occurs first so it wins.

How can I do it in PostreSQL?

Solution

PostgreSQL has a built-in mode()within group(order by e) function:

mode () WITHIN GROUP ( ORDER BY anyelement ) → anyelement
Computes the mode, the most frequent value of the aggregated argument (arbitrarily choosing the first one if there are multiple equally-frequent values). The aggregated argument must be of a sortable type.

Which matches the Wikipedia definition of a modal value:

Type Description Example Result

Arithmetic mean Sum of values of a data set divided by number of values (1+2+2+3+4+7+9) / 7 4

Median Middle value separating the greater and lesser halves of a data set 1, 2, 2, 3, 4, 7, 9 3

Mode Most frequent value in a data set 1, 2, 2, 3, 4, 7, 9 2

Type	Description	Example	Result
Arithmetic mean	Sum of values of a data set divided by number of values	(1+2+2+3+4+7+9) / 7	4
Median	Middle value separating the greater and lesser halves of a data set	1, 2, 2, 3, 4, 7, 9	3
Mode	Most frequent value in a data set	1, 2, 2, 3, 4, 7, 9	2

The function is an ordered-set aggregate, not a variadic, so you'd have to pretend values in each row are a column. You can pack them into an array[], then unnest() it, then process it and return that, all inside a correlated scalar subquery:
_{demo at db-fiddle}

WITH arrs as(select *,array[k, v, z, y, m] as arr 
             from vals)
select k, v, z, y, m
      ,(select mode()within group(order by e) 
        from unnest(arr) as e) as modal
from arrs;

Or try unpivoting, then re-aggregating:

WITH one_column as(select ctid,k as c from vals
                   union all
                   select ctid,v from vals
                   union all
                   select ctid,z from vals
                   union all
                   select ctid,y from vals
                   union all
                   select ctid,m from vals)
select min(vals.k)
      ,min(vals.v)
      ,min(vals.z)
      ,min(vals.y)
      ,min(vals.m)
      ,mode()within group(order by c)
from one_column
join vals on one_column.ctid=vals.ctid
group by ctid;

Your updated example breaks ties in a different way than mode() does:

arbitrarily choosing the first one if there are multiple equally-frequent values

It looks like what you want is for the one earlier in the unordered list to win.
To achieve that behaviour, you can just count(*) values in each row (packed into array[], then unnest()ed), sort this by placing most frequent first, and break ties by promoting the one that appeared earlier. unnest()with ordinality adds the information about the original position of the unpacked element.
_{demo at db-fiddle}

WITH arrs as(select *,array[k, v, z, y, m] as arr 
             from vals)
select k, v, z, y, m
      ,(select e 
        from unnest(arr)with ordinality as _(e,ordinality)
        group by e
        order by count(*) desc
                ,min(ordinality) asc--earlier in the undordered list wins
                --,e desc--larger one wins
        limit 1)
from arrs;

To get all modal value candidates, you can use fetch first 1 rows with ties. This also uses a special array() constructor that accepts a subquery directly:
_{demo at db-fiddle}

WITH arrs as(select *,array[k, v, z, y, m] as arr 
             from vals)
select k, v, z, y, m
      ,array(select unnest(arr)e
             group by e
             order by count(*) desc
             fetch first 1 rows with ties
       ) as all_modal_value_candidates
from arrs;

In older versions of PostgreSQL, you can emulate that using rank()over():

WITH arrs as(select *,array[k, v, z, y, m] as arr 
             from vals)
select k, v, z, y, m
      ,(select array_agg(e)
        from(select unnest(arr)e, rank()over(order by count(*) desc)
             group by e)_
        where rank=1
       ) as all_modal_value_candidates
from arrs;