Search code examples
pythonpandasgeolocation

Applying .get() function On a Pandas series


I am working on sample dataset to retrieve location information from address(some details are changed for identification purpose);

temp2=pd.DataFrame({'USER_ID':[1268,12345,4204,4208], 'IP_ADDR':['142.176.00.83','24.000.63.230','187.178.252.99','187.178.250.99']})

My goal is to get Lattitude and longitude information using the ip2geotools python package. The syntax is follows;

!pip install ip2geotools
response = DbIpCity.get(a, api_key='free')
json_file = response.to_json()

where a='142.176.00.83'. Then we get a JSON file like this;

'{"ip_address": "142.176.00.83", "city": "Charlotte", "region": "Prince Edward", "country": "CA", "latitude": 46.2, "longitude": -63.131}'

I am trying to apply the function on an entire pandas series (vectored form) and retrieve latitude and longitude as two different columns. Here is my attempt:

temp2['y'] = temp2['IP_ADDR'].apply(lambda x: DbIpCity.get(x, api_key='free'))

But it seems it doesn't like this syntax, InvalidRequestError: .

But if I execute the code on one string it works fine;

DbIpCity.get('2401:4900:40cc:e9cc:6ccc:348e:4020:2593', api_key='free')

ip2geotools.models.IpLocation(2401:4900:40cc:e9cc:6ccc:348e:4020:2593)

On the other hand, if there are no quotes then it fails;

DbIpCity.get(2401:4900:40cc:e9cc:6ccc:348e:4020:2593, api_key='free')
SyntaxError: invalid syntax

But my data doesn't have quotes around it. If I try to add the quotes it fails;

i=str(2401:4900:40cc:e9cc:6ccc:348e:4020:2593)
print("'"+str(i)+"'")      
    i=str(2401:4900:40cc:e9cc:6ccc:348e:4020:2593)
          ^
   SyntaxError: invalid syntax                      

Can I kindly get some help on how to vectorize this operation and retrieve fields from JSON file. thanks


Solution

  • The error is raised by ip2geotools, not pandas, because the IP format is improper. Code works for me after changing IP's to have only single 0's in each part.

    i.e. change '24.000.63.230' to '24.0.63.230'

    You can apply this fix to your dataframe using:

    temp2['IP_ADDR'] = temp2['IP_ADDR'].replace(r'\.0+\.', '.0.', regex=True)