Search code examples
sqlpostgresqlsql-order-bydistinctaggregate-functions

Ordering distinct column values by (first value of) other column in aggregate function


I'm trying to order the output order of some distinct aggregated text based on the value of another column with something like:

string_agg(DISTINCT sometext, ' ' ORDER BY numval)

However, that results in the error:

ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list

I do understand why this is, since the ordering would be "ill-defined" if the numval of two repeated values differs, with that of another lying in-between.

Ideally, I would like to order them by first appearance / lowest order-by value, but the ill-defined cases are actually rare enough in my data (it's mostly sequentially repeated values that I want to get rid of with the DISTINCT) that I ultimately don't particularly care about their ordering and would be happy with something like MySQL's GROUP_CONCAT(DISTINCT sometext ORDER BY numval SEPARATOR ' ') that simply works despite its sloppiness.

I expect some Postgres contortionism will be necessary, but I don't really know what the most efficient/concise way of going about this would be.


Solution

  • If this is part of a larger expression, it might be inconvenient to do a select distinct in a subquery. In this case, you can take advantage of the fact that string_agg() ignores NULL input values and do something like:

    select string_agg( (case when seqnum = 1 then sometext end) order by numval)
    from (select sometext, row_number() over (partition by <whatever>, sometext order by numval) as seqnum
          from t
         ) t
    group by <whatever>
    

    The subquery adds a column but does not require aggregating the data.