Search code examples
algorithmsearchip

IP substring search when IP is stored as an integer


Let's say I have a lot of IPv4s stored as integers (specifically, in a relational database) and I want to do a substring search on them given a string representation of an IP.

For example, a user types in 12.3 and expects that they get back results such as 12.30.45.67, 192.168.12.3, 1.12.34.5, 9.212.34.5.

If the IP were a string, I could just do a plain substring search. It might not be efficient but it is at least simple to implement and understand. But because I can't readily change it into a string at the moment, I don't see any efficient (in terms of CPU cycles, memory, and also my development/implementation time) way of doing this, but maybe I am just missing something.


Solution

  • You aren't missing anything.

    For example try to turn 12.3 into a series of ranges. In whichever octet the 12 is in, there will be 3 options (12, 112, 212). In whichever octet the 3 is in there will be 2 options (3 and 30-39). That's 6 range per combination of preceding octets.

    But the preceding octets? We have 1 + 256 + 256*256 depending whether 0, 1 or 2 octets precede your start.

    That's a grand total of 3 * 2 * (1 + 256 + 256*256) = 394758 ranges of numbers you have to search in. It is unlikely that doing that many index searches will be faster than scanning everything.

    Incidentally the worst case would be 1.2. In that case you'd have had 17 * 3 * (1 + 256 + 256*256) = 3355443 range lookups to do!

    If they want this badly enough, you need to do a full text search on strings.