Search code examples
geolocationip

How do you even start a Geo IP database?


There are plenty of API services out there that offer IP to geolocation infromation. There are also static databases that you can buy and then get updates on. However, I'm curious about the process: how do you even start a database like that?

  • Can you make assumptions based on prefixes?
  • Do datacenters and ISPs publish some range that they own so you can make assumptions based on that?
  • Does it really need to be a database, or could it be a series of conditionals and range checks?
  • What if you just need US states, for example, could range checks be used to know that?
  • Why are updates needed?

Solution

  • How do you even start a database like that?

    Depending of the context, law can regulate your activity. Here it is collecting IP adresses with timesptamp range, combined with geolocalisation data. This to protect the privacy of its users.

    The capacity/performances of your chosen DBMS has to compare with the stored data.

    In our case the total number of public addresses with some other attributes.

    Can you make assumptions based on prefixes?

    Yes, but only up to a certain point.

    The primary source for IP address data is the Regional Internet Registries. They allocate and distribute IP addresses amongst organizations located in their respective service regions. Eg. African Network Information Centre (AfriNIC), Asia-Pacific Network Information Centre (APNIC) etc.

    The registries allow assignees to specify country of each assigned block.

    Do datacenters and ISPs publish some range that they own so you can make assumptions based on that?

    Yes for ISPs (as Local Internet Registries, LIR). Each Regional Internet Registry is composed of LIRs. For example, it is possible to write a script that query range info from the RIPE RIR (Europe) db that is exposed with a REST API at https://apps.db.ripe.net

    Does it really need to be a database, or could it be a series of conditionals and range checks?

    To improve accuracy, some services cross reference data from various other types of sources, including internet crawlers. We can imagine a service that choose or not also to store or update the submitted IP. A dedicated DBMS seems unavoidable due to the amount of public IPv4/IPv6 addresses to store.

    What if you just need US states, for example, could range checks be used to know that?

    Yes, but not always with an accuracy that you could find satisfactory.

    Why are updates needed?

    Most of the time, ISPs assign public IP addresses to certain endpoints that may change over time.