Search code examples
sqldatabaseipv6ipv4

Relational database for tracking IPv6/IPv4 addresses -- will my proposed schema work?


Background

I'm building an IPAM app to track and store metadata for individual IPv4 and IPv6 addresses. The backend is intended to be a boring, vendor-agnostic relational database.

IPv6 may deal with big numbers in a vast addressable space, but the scope in question does not inherently constitute big data, so I'm not willing to change backend architectures without some actual technical shortcoming of my current approach that is better served by a hip NoSQL solution at the expense of relations and ACIDity.

(I'm not trying to document the entire address space, only live addresses in use by arbitrary customers.)

Schema

Normalize the string representation of a given IP address and use that as the primary key. IPv4 addresses get converted to IPv6 and prefixed with ffff. IPv6 addresses get compressed and lowercased.

A second field indicates which protocol version the record in question is-- 4 or 6. The idea here is that if a user searches for records in an IPv4 subnet, I can quickly exclude the IPv6 space, or vice versa.

The next eight fields (ugh) are all integer representations of each octet in the address (octet_1, octet_2, etc.).

Indexes

Primary key should already be its own unique index.

Create an additional index on (version, octet_1, ..., octet_8).

Querying

For searching for a specific IP of either version, I can simply normalize the IP string the same way as above and search through the primary keys.

For searching by subnet, the application calculates the start/end address for the range, casts both as IPv6, converts both to octuples, and issues a query for all records with octuples between those.

What problems might I run into with this approach? Suggestions for improvement?

Anything from ipv4s casted as ipv6 are not the same thing to your index will explode / write performance will suck is fair game.

I built a test POC which validates the functionality of this schema but I'm concerned about any potential shortcomings of this model in a production environment.


Solution

  • If you can choose database backend then go for PostgreSQL. It has built in types for IP addresses and therefore offers great performance and features.

    But you said that you wanted to be database agnostic, so let's focus on that. In that case I would do just the string representation with IPv4 addresses prefixed with ::ffff:, but then use only lower case hexadecimal notation and no compression. So IPv4 address 10.11.12.13 would become 0000:0000:0000:0000:0000:ffff:0a0b:0c0d.

    Almost all databases have good indexing on strings, and with this notation you can easily do subnet and range queries. If you want all IPv4 addresses just query for LIKE '0000:0000:0000:0000:0000:ffff:%'. Because it's anchored at the start a standard btree index should work well. You can do more complex queries for ranges with < and > operators, which again can benefit from a standard index. That should give you most subnet queries.

    In your application it shouldn't be too hard to parse the strings with inet_pton etc to convert them to whatever you need.

    I'd avoid denormalising in this case. With what I described above you shouldn't need separate version or octet columns. They'll only slow things down and increase the chances of inconsistencies.