Search code examples
pythonpandascsvtext-mining

How to assign an item in a pandas dataframe after checking for conditions?


I am iterating through a pandas dataframe (originally a csv file) and checking for specific keywords in each row of a certain column. If it appears at least once, I add 1 to a score. There are like 7 keywords, and if the score is >=6, I would like to assign an item of another column (but in this row) with a string (here it is "Software and application developer") and safe the score. Unfortunately, the score is everywhere the same what I find hard to believe. This is my code so far:

for row in data.iterrows():
devScore=0
if row[1].str.contains("developer").any() | row[1].str.contains("developpeur").any():
    devScore=devScore+1
if row[1].str.contains("symfony").any():
    devScore=devScore+1
if row[1].str.contains("javascript").any():
    devScore=devScore+1
if row[1].str.contains("java").any() | row[1].str.contains("jee").any():
    devScore=devScore+1
if row[1].str.contains("php").any():
    devScore=devScore+1
if row[1].str.contains("html").any() | row[1].str.contains("html5").any():
    devScore=devScore+1
if row[1].str.contains("application").any() | row[1].str.contains("applications").any():
    devScore=devScore+1
if devScore>=6:
    data["occupation"]="Software and application developer"
    data["score"]=devScore

Solution

  • You assign a constant onto the whole column here:

    data["occupation"]="Software and application developer"
    data["score"]=devScore
    

    They are supposed to be:

    for idx, row in data.iterrows():
        # blah blah
        #
        .
        .
        data.loc[idx, "occupation"]="Software and application developer"
        data.loc[idx, "score"]=devScore