Search code examples
pythonip-addresscidr

Quickly locate an IP address within 3 million rows of CIDR formatted IP ranges


I have obtained the IP-based geolocation database, which is of the following format:

1.0.0.0/24,2077456,2077456,,0,0,,-33.4940,143.2104,1000
1.0.1.0/24,1814991,1814991,,0,0,,34.7725,113.7266,50
1.0.2.0/23,1814991,1814991,,0,0,,34.7725,113.7266,50
1.0.4.0/22,2077456,2077456,,0,0,,-33.4940,143.2104,1000
1.0.8.0/21,1814991,1814991,,0,0,,34.7725,113.7266,50
...
223.255.254.0/24,1880252,1880251,,0,0,37,1.3267,103.8869,5
223.255.255.0/24,2077456,2077456,,0,0,,-33.4940,143.2104,1000

There are totally over 3 million rows contained in a CSV file. The first section of each line is CIDR formatted IP range.

I need an efficient way to quickly locate a given IP address among these lines. For example, given the IP address 1.0.1.2, I want to quickly locate the second line, so I can then obtain its coordination, which are the rest of data in this row. I wonder if there is any efficient way to do this, instead of inspection each row from the beginning.

The difficulty lies in, for example, the IP range 1.0.2.0/23 includes the IP address 1.0.3.0, so mere string matching will not work very well.


Solution

  • I found a way myself. First to convert the 8-bit sectioned IP address into decimal format, which makes the CIDR formatted IP range into two decimal numbers. Then I used the bisect module to find the range where the IP belongs to.