Search code examples
pythonpandasloopsoutputjaro-winkler

Apply Python function to each row and append


I have the following data :

enter image description here

I am trying to use the library - pyjarowinkler and find the distance between strings - my hello world code works

#Hello World
d1=distance.get_jaro_distance("Hello","hello", winkler=True, scaling=0.1);
d1

When I try to iterate each row or use apply my code fails. Can someone please point me in the right direction.

#Import data 
import pandas
df = pandas.read_csv('data.csv')
from pyjarowinkler import distance
score=df.apply(distance.get_jaro_distance(df[S1],df[Stores]))



# iterating over rows using iterrows() function  
for i, j in df.iterrows(): 
    print(i, j,distance.get_jaro_distance(i,j,winkler=True, scaling=0.1)) 
    print()

Error:

JaroDistanceException: Cannot calculate distance from NoneType (int, Series)

The expected output is :

enter image description here


Solution

  • I think you should be able to do

    df['distance'] = df.apply(lambda d: distance.get_jaro_distance(d['S1'],d['store'],winkler=True,scaling=0.1), axis=1)
    

    note the axis=1 parameter being passed to .apply, this tells it to operate on the df row-wise rather than column-wise