Search code examples
machine-learninggoogle-apigisgeocodinggoogle-geocoder

Google geocoding API Inner Workings


I'm currently working with some large datasets that include some location based information but lack direct latitude and longitude measurements which I need in order to create visualizations.

In order to resolve this problem, I've been using geocoding APIs that require addresses or address-like information as input and provide latitude and longitude information as output.

I started by using the Nominatim API. Unfortunately, due to the nature of the address-like data that I have, many of my queries failed so I started using the Google geocoding API. The Google API provided me with a significantly higher success rate, but it is a paid API which is not ideal.

I realize that given the incredible resources that Google has that it would be virtually impossible to build a system that rivals their geocoding API within a reasonable amount of time, but it's made me wonder what's going on under the hood.

Is a BERT-like translational system at work? What happens to the text after it's sent off?


Solution

  • I'm using n-grams for similar usage by creating an index and an inverted index. See this package ngram

    import ngram 
    ...
    
    country = filename.replace('.csv', '')
    ind[country] = ngram.NGram()
    inv[country] = {}
    s_csv = csv.reader(stream, delimiter=';')
    next(s_csv)
    for row in s_csv:
        coord = tuple(map(float, row[0:2]))
        ad = ' '.join(row[2:]).lower()
        ind[country].add(ad)
        inv[country][ad] = (coord, address)
    

    then you can use the find function

    Take care of the memory consumption ~16GB RAM for a country like France and OSM Data

    To see an implementation of that, check this OpenGeoCode HTTP API Service source code