Here is my datafarme 'df':
match name group
adamant Adamant Home Network 86
adamant ADAMANT, Ltd. 86
adamant bild TOV Adamant-Bild 86
360works 360WORKS 94
360works 360works.com 94
Per group number I want to compare the names one by one and see if they are matched to a same word from the 'match' column.
So desired output will be counts:
If they match we count it as 'TP' and if not we count it as 'FN'.
I had an idea of counting number of match words per group number but that would not help completely with what I want:
df.groupby(group).count()
Does any body have an idea how to do it?
If I understood well your question, this should do the work:
import re
import pandas
df = pandas.DataFrame([['adamant', 'Adamant Home Network', 86], ['adamant', 'ADAMANT, Ltd.', 86],
['adamant bild', "TOV Adamant-Bild", 86], ['360works', '360WORKS', 94],
['360works ', "360works.com ", 94]], columns=['match', 'name', 'group'])
def my_function(group):
for i, row in group.iterrows():
if ''.join(re.findall("[a-zA-Z]+", row['match'])).lower() not in ''.join(
re.findall("[a-zA-Z]+", row['name'])).lower():
# parsing the names in each columns and looking for an inclusion
# if one of the inclusion fails, we return 'FN'
return 'FN'
# if all inclusions succeed, we return 'TP'
return 'TP'
res_series = df.groupby('group').apply(my_function)
res_series.name = 'count'
res_df = res_series.reset_index()
print res_df
This will give you this DataFrame:
group count
1 86 'TP'
2 94 'TP'