Search code examples
pythonpandasdataframenlpnltk

Not able to assign values to a column. Bag_of_words


I am trying to assign values to a column in my pandas df, however I am getting a blank column, here's the code:

df['Bag_of_words'] = ''
columns = ['genre', 'director', 'actors', 'key_words']


for index, row in df.iterrows():
    words = ''
    for col in columns:
        words += ' '.join(row[col]) + ' '
    row['Bag_of_words'] =words

enter image description here

The output is an empty column, can someone please help me understand what is happening here, as I am not getting any errors.


Solution

  • from the iterrows documentation:

    1. You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

    So you do row[...] = ... and it turns out row is a copy and that's not affecting the original rows.

    iterrows is frowned upon anyway, so you can instead

    • join each words list per row to become strings

    • aggregate those strings with " ".join row-wise

    • add space to them

    df["Bag_of_words"] = (df[columns].apply(lambda col: col.str.join(" "))
                                     .agg(" ".join, axis="columns")
                                     .add(" "))