I have a pandas dataframe df_test
consisting of IP address like below :
| cs-username | c-ip |
+--------------+-------------+
|- | 70.80.84.76 |
|- | 70.80.84.76 |
|- | 70.80.84.76 |
|- | 70.80.84.76 |
My goal is to get the name of country from each of IP address,and I have used DbIpCity from ip2geotools.So I have written code like below.
from ip2geotools.databases.noncommercial import DbIpCity
#Your code
df_test['Country'] = df_test.apply(lambda row: DbIpCity.get(row['c-ip'],api_key='free').country, axis=1)
However this results in error like below :
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-8-3772268ef132> in <module>()
2
3 #Your code
----> 4 df_test['Country'] = df_test.apply(lambda row: DbIpCity.get(row['c-ip'],api_key='free').country, axis=1)
5 frames
/usr/local/lib/python3.7/dist-packages/ip2geotools/databases/noncommercial.py in get(ip_address, api_key, db_path, username, password)
65 # format data
66 ip_location.country = content['countryCode']
---> 67 ip_location.region = content['stateProv']
68 ip_location.city = content['city']
69
KeyError: 'stateProv'
The code is in the below colab link (last cell) in case of reference: https://colab.research.google.com/drive/1zz1LZ2uOAp1YsX0x0CJfvcM21XGkeCO5?usp=sharing
So how can I resolve this error ?
Thanks
The program throws a KeyError
when it can't get any data about the IP address. To avoid the script from stopping, you could use an exception
. But because the ip2geotools
library has a request limit, I decided to go with geolocation-db instead:
(I used a for loop
instead of lambda
)
import pandas as pd
import numpy as np
import urllib.request
import json
df = pd.read_csv('temp.csv')
countries = []
ips = []
# Get Country info from https://geolocation-db.com
def getCountry(ip):
with urllib.request.urlopen("https://geolocation-db.com/jsonp/"+ip) as url:
data = url.read().decode()
data = data.split("(")[1].strip(")")
return json.loads(data)['country_name']
for index, row in df.iterrows():
# Get IP data
data = row['c-ip']
if data not in ips:
print(data)
ips.append(data)
#response = DbIpCity.get(row['c-ip'], api_key='free')
response = getCountry(row['c-ip'])
if response != None:
print(response)
# Add to country list
countries.append(response)
# If contry is None, add np.nan instead of None
else:
print(np.nan)
countries.append(np.nan)
# Insert all data into a new df
ips = {'ip': ips,
'country': countries,
}
df_ips = pd.DataFrame(ips, columns = ['ip', 'country'])
print(df_ips)
And because your CSV file is soo huge, use a filter to avoid the processing of duplicate IPs.
And I found these errors in your Log:
ERROR: geoip2 4.1.0 has requirement requests<3.0.0,>=2.24.0, but you'll have requests 2.23.0 which is incompatible.
ERROR: geoip2 4.1.0 has requirement urllib3<2.0.0,>=1.25.2, but you'll have urllib3 1.24.3 which is incompatible.
Try doing pip install --upgrade requests urllib3
. You might have to upgrade them.