I have obtained the IP-based geolocation database, which is of the following format:
1.0.0.0/24,2077456,2077456,,0,0,,-33.4940,143.2104,1000
1.0.1.0/24,1814991,1814991,,0,0,,34.7725,113.7266,50
1.0.2.0/23,1814991,1814991,,0,0,,34.7725,113.7266,50
1.0.4.0/22,2077456,2077456,,0,0,,-33.4940,143.2104,1000
1.0.8.0/21,1814991,1814991,,0,0,,34.7725,113.7266,50
...
223.255.254.0/24,1880252,1880251,,0,0,37,1.3267,103.8869,5
223.255.255.0/24,2077456,2077456,,0,0,,-33.4940,143.2104,1000
There are totally over 3 million rows contained in a CSV file. The first section of each line is CIDR formatted IP range.
I need an efficient way to quickly locate a given IP address among these lines. For example, given the IP address 1.0.1.2
, I want to quickly locate the second line, so I can then obtain its coordination, which are the rest of data in this row. I wonder if there is any efficient way to do this, instead of inspection each row from the beginning.
The difficulty lies in, for example, the IP range 1.0.2.0/23
includes the IP address 1.0.3.0
, so mere string matching will not work very well.
I found a way myself. First to convert the 8-bit sectioned IP address into decimal format, which makes the CIDR formatted IP range into two decimal numbers. Then I used the bisect
module to find the range where the IP belongs to.