I am iterating through a pandas dataframe (originally a csv file) and checking for specific keywords in each row of a certain column. If it appears at least once, I add 1 to a score. There are like 7 keywords, and if the score is >=6, I would like to assign an item of another column (but in this row) with a string (here it is "Software and application developer") and safe the score. Unfortunately, the score is everywhere the same what I find hard to believe. This is my code so far:
for row in data.iterrows():
devScore=0
if row[1].str.contains("developer").any() | row[1].str.contains("developpeur").any():
devScore=devScore+1
if row[1].str.contains("symfony").any():
devScore=devScore+1
if row[1].str.contains("javascript").any():
devScore=devScore+1
if row[1].str.contains("java").any() | row[1].str.contains("jee").any():
devScore=devScore+1
if row[1].str.contains("php").any():
devScore=devScore+1
if row[1].str.contains("html").any() | row[1].str.contains("html5").any():
devScore=devScore+1
if row[1].str.contains("application").any() | row[1].str.contains("applications").any():
devScore=devScore+1
if devScore>=6:
data["occupation"]="Software and application developer"
data["score"]=devScore
You assign a constant onto the whole column here:
data["occupation"]="Software and application developer"
data["score"]=devScore
They are supposed to be:
for idx, row in data.iterrows():
# blah blah
#
.
.
data.loc[idx, "occupation"]="Software and application developer"
data.loc[idx, "score"]=devScore