Search code examples
sqlpostgresqlaggregates

How to identify non-identical values being aggregated with COUNT()?


I have a table containing three columns: city ID, language code and city:

city_id | language_code | city
----------------------------------
1       | en            | London
1       | es            | Londres
1       | pt            | Londres
2       | de            | Köln
2       | en            | Cologne
3       | it            | Cologne

In some foreign languages cities can be spelled the same way, e.g. the Spanish and Portuguese name for London is Londres.

But there are cases where the same name can refer to completely different locations, e.g. Cologne is an English name for the German city of Köln but there's also a town of the same name in Italy.

I would like to be able to detect cities that have more than one entry in the table but only those that are linked to different city_id values. In my case this would be Cologne but not Londres as both es and pt language versions point to the same city_id.

I thought this would be a fairly easy thing to do but I haven't been able to get the results in a single query. Instead, I am deduping the results first and then aggregating them:

WITH deduped_cities AS (
  SELECT DISTINCT city, city_id
  FROM cities
  ORDER BY city
)
SELECT city, COUNT(city_id) AS total
FROM deduped_cities
GROUP BY city
HAVING COUNT(city_id) > 1;

This gives me the expected result:

city    | total
----------------
Cologne | 2

I was just wondering if it is possible to achieve the same effect with a single SELECT statement.

DB Fiddle


Solution

  • you have basically just one SELECT, but you can use DISTINCT' with the COUNT

    SELECT city, COUNT(city_id) AS total
    FROM cities
    GROUP BY city
    HAVING COUNT( DISTINCT city_id) > 1;
    

    See fiddle