Search code examples
sqlpostgresqlaggregation

Combinational SQL count query in Postgres


I have some stations stored in a simple stations collection:

+----+-----------+
| id | name      |
+----+-----------+
| 1  | Station A |
+----+-----------+
| 2  | Station B |
+----+-----------+
| 3  | Station C |
+----+-----------+
| 4  | Station D |
+----+-----------+

And I have some rides stored in the rides collection:

+----+---------------+-------------+
| id | fromStationId | toStationId |
+----+---------------+-------------+
| 1  | 3             | 4           |
+----+---------------+-------------+
| 2  | 2             | 1           |
+----+---------------+-------------+
| 3  | 1             | 1           |
+----+---------------+-------------+
| 4  | 3             | 2           |
+----+---------------+-------------+

I would like to create a count list of all inter-station rides between all possible pairs of fromStation names and toStation names with the result looking like this:

[
  {
    "fromStation": "Station A",
    "toStation": "Station A",
    "count": 1196
  },
  {
    "fromStation": "Station A",
    "toStation": "Station B",
    "count": 1
  },
  {
    "fromStation": "Station A",
    "toStation": "Station C",
    "count": 173
  },
]

And so on for all other combinations...

How do I get all possible two-pair combinations of station names and then count the number of rides among them? I'm using the latest version of Postgres.


Solution

  • Aggregate rides first, then resolve IDs to names:

    SELECT f.name AS from_station, t.name AS to_station, count
    FROM  (
       SELECT from_station_id, to_station_id, count(*) AS count
       FROM   rides
       GROUP  BY 1, 2
       ) r
    JOIN   stations f ON f.id = r.from_station_id
    JOIN   stations t ON t.id = r.to_station_id
    ORDER  BY 1, 2;  -- optional order
    

    Of course, that only produces combinations with actual rides. If you need to include combinations without any rides, you need an OUTER join to a Cartesian product of the stations table with itself. Something like:

    -- include all combinations (even without rides)
    SELECT from_station, to_station, COALESCE(count, 0) AS count
    FROM  (
       SELECT from_station_id, to_station_id, count(*) AS count
       FROM   rides
       GROUP  BY 1, 2
       ) r
    RIGHT  JOIN (
       SELECT f.id AS from_id, f.name AS from_station
            , t.id AS to_id  , t.name AS to_station
       FROM   stations f CROSS JOIN stations t
       ) s ON  s.from_id = r.from_station_id
          AND  s.to_id   = r.to_station_id
    ORDER  BY 1, 2;  -- optional order
    

    Again, it's cheaper to aggregate rides before joining to stations.

    To wrap it up as JSON array or records, simply:

    SELECT json_agg(sub)
    FROM  (
       -- query from above
       ) sub;
    

    db<>fiddle here