Search code examples
mysqlgroup-bygeolocation

SQL Group By. Duplicate lat/lng in differents cities


I bought a geo-database a long time ago and I'm updating its precision to the lat/lng values. But I've found some weird stuff. There are some cities that have the same lat/lng coordinates. Thing that is geographically impossible.

id   City    State   Lat    Lng
1     A        sA     XX     XX
2     B        sA     XX     XX
3     C        sA     YY     YY
4     D        sA     ZZ     ZZ

So I tried Group By City, Lat, Lng but as I need the id to update the record the group by clause will ask me to add ´id´ column.

From the table ids 1 and 2 should be updated leaving 3 and 4 out. It shouldn't be 2 (or more) cities with the same Lat/Lng. The Table has 22K rows. I could send all to gmap API but I'm looking for use the time, bandwith and hits to the API as smart as possible but I'm running out of time considering I can make a request per second using the free API access.

I've tried

    SELECT DISTINCT postcodes_id, Latitude, Longitude, Region1Name, Region2Name, Nation_D
      FROM postcodes
      where Latitude + Longitude  IN 
      (
             SELECT Latitude + Longitude
             FROM
             (  
                        SELECT postcodes_id, Latitude, Longitude, count(distinct(Region2Name))  as cantidad 
                        FROM postcodes
                        where Nation_D is not null
                        GROUP BY Latitude, Longitude
                        having count(distinct(Region2Name)) > 1
             ) A
      )
      AND Nation_D IS NOT NULL

  ORDER BY  Latitude, Longitude, Region1Name, Region2Name, Nation_D

But is not working as expected. I think its pretty obvious for a new pair of eyes.

I wrote a python script to use Google Map geocode to get the current Lat/Lng and update it if it's different. This script works ok.

Hope someone has an idea. Thanks!!

Running MySQL 5.5 and Python 2.7 on a CentOS 7.


Solution

  • Just some pointers for you, which may be helpful:

    1. You should not use group by or distinct on lat/lon or any combination of them, since they are contiguous floating points numbers and not discrete integers or strings.

    2. By the same token, you should not use WHERE clauses on lat/lon or their sum. If you mean to check for proximity of two locations, use st_distance() function instead.

    3. Multiple city names can refer to the same location. For example, New York, NY and Manhattan, NY.

    And a non-technical point: storing Google geocoding data in your database is against their licensing agreement.