Search code examples
pythonperformancepython-2.7maxmind

MaxMind IP Lookup Speed with Python API


I'm building a script to process web server logs, and I'm trying to incorporate MaxMinds's IP dataset (http://dev.maxmind.com/geoip/legacy/geolite/) into the script in order to get the country the hit is coming from.

Currently, my script works fine when I just have it extract the information I want, however when I try to add IP lookup's it slows down - a lot - by about 1800%. So, I'm curious if this has something to do with my code or if there is a way I can speed this up.

For example, when I run the following code extracting date and ip address, for this experiment it took about 6.5 seconds.

extractedData = []

for log in logList:
    ip = log[-1]
    date = log[0]
    dateIP = [date, ip]
    extractedData.append(dateIP)

When I add pyGeoIP and try to incorporate the country code it slows down. The following code took 2 mintues and 7seonds to run.

extractedData = []

gi = pygeoip.GeoIP('/path/to/GeoIP.dat') 

for log in logList:
    ip = log[-1]
    country = gi.country_name_by_addr(ip)
    date = log[0]
    dateCountry = [date, country]
    extractedData.append(dateCountry)

So, is there a way to speed this up since this look up will slow to process down too much.

Thanks!


Solution

  • Since you're doing many queries, you should load the database into memory. As it stands, you're repeatedly reading from the disk, which is painfully slow.

    Exchange this line:

    gi = pygeoip.GeoIP('/path/to/GeoIP.dat') 
    

    to this:

    gi = pygeoip.GeoIP('/path/to/GeoIP.dat', pygeoip.MEMORY_CACHE)