Search code examples
sql-servert-sqlipv6ipv4maxmind

Query Maxmind GeoLite2 City Blocks csv in SQL Server to get geoname_id for IPv6 and IPv4 addresses


Maxmind has two free GeoLite2 cvs files that we can use to look up locations associated with IP address ranges. I want to create a stored proc in SQL Server that uses the imported tables ([GeoLite2_City_Blocks] and [GeoLite2_City_Locations]), taking a varchar(45) string passed in IP address and returns the location associated with that IP address. I would have thought related T-SQL code would exist on the web, but I cannot find it.

I checked out Querying GeoLite2 Country CSV in SQL, but I simply don't understand what Prefix_Length (which is called Network_mask_Length in the Blocks table), does or is. If they are the same as CIDR Prefixes (see: http://en.wikipedia.org/wiki/IPv6_subnetting_reference), I am even more confused, since the Network_mask_Length could be 118, 119, or any of many other numbers not shown on that Wikipedia page.

I have seen a web page talking about IPv6 in SQL Server that suggested padding with zeros for doing comparisons. I suppose that the free GeoLite2 cannot simply be queried without at least an additional column being added to the table for the end IP address of the range?

I suppose I could strip out "::ffff:" from in front of the IPv4 addresses, if that is the correct way to say that. But still I don't get how to translate that network_mask_length (prefix) field into something I understand. And then zero pad the IPv6 addresses, but then I still don't know what that prefix is about.

My questions are:

  1. Is there sample code available anywhere which can get me on the right track? Or,
  2. What do I need to do to create a proc that will accept an address in IPv4 or IPv6, and return the relevant location or geoname_id?

Solution

  • The prefix length is the number of significant bits in the network address. For any IPv6 address, this is a number between 0-128 as IPv6 addresses are 128 bits in length. For instance an network with a prefix of 128 would contain just one IP address. A network with a prefix of 127 would contain two addresses. 126 would have 4, and so on.

    IPv6 addresses of the form ::FFFF:10.11.12.13, as you note, correspond to the IPv4 address with the '::FFFF:' removed. The '::FFFF:' corresponds to the first 96 bits of the address. If the IPv6 address had a prefix length of 126, the IPv4 address would have a prefix length of 30 (i.e., the last two bits are significant).

    A Google search should reveal code snippets to convert CIDR notation to ranges. If you only care about IPv4, it might be easiest to pre-filter the data, removing the ::FFFF:, subtracting 96 from the prefix length, and discarding the address ranges not mapped to IPv4.