Search code examples
sqlpostgresqlplpgsqlwindow-functionsgaps-and-islands

GROUP BY and aggregate sequential numeric values


Using PostgreSQL 9.0.

Let's say I have a table containing the fields: company, profession and year. I want to return a result which contains unique companies and professions, but aggregates (into an array is fine) years based on numeric sequence:

Example Table:

+-----------------------------+
| company | profession | year |
+---------+------------+------+
| Google  | Programmer | 2000 |
| Google  | Sales      | 2000 |
| Google  | Sales      | 2001 |
| Google  | Sales      | 2002 |
| Google  | Sales      | 2004 |
| Mozilla | Sales      | 2002 |
+-----------------------------+

I'm interested in a query which would output rows similar to the following:

+-----------------------------------------+
| company | profession | year             |
+---------+------------+------------------+
| Google  | Programmer | [2000]           |
| Google  | Sales      | [2000,2001,2002] |
| Google  | Sales      | [2004]           |
| Mozilla | Sales      | [2002]           |
+-----------------------------------------+

The essential feature is that only consecutive years shall be grouped together.


Solution

  • There's much value to @a_horse_with_no_name's answer, both as a correct solution and, like I already said in a comment, as a good material for learning how to use different kinds of window functions in PostgreSQL.

    And yet I cannot help feeling that the approach taken in that answer is a bit too much of an effort for a problem like this one. Basically, what you need is an additional criterion for grouping before you go on aggregating years in arrays. You've already got company and profession, now you only need something to distinguish years that belong to different sequences.

    That is just what the above mentioned answer provides and that is precisely what I think can be done in a simpler way. Here's how:

    WITH MarkedForGrouping AS (
      SELECT
        company,
        profession,
        year,
        year - ROW_NUMBER() OVER (
          PARTITION BY company, profession
          ORDER BY year
        ) AS seqID
      FROM atable
    )
    SELECT
      company,
      profession,
      array_agg(year) AS years
    FROM MarkedForGrouping
    GROUP BY
      company,
      profession,
      seqID