Search code examples
pythonpandasdataframeperformancememory-efficient

What is the best way to iterate over Pandas rows in my case?


I have a complexity problem with iterations over Pandas Rows. I have a dataset with over 30k rows and i need for each one, add a new column with values from specifics columns.

belongs_node_df = pd.DataFrame.from_records(belongs_node, columns=['hashtag', 'tweets_id', 'tokenized_text','sentiment_compound'])
posted_node_df = pd.DataFrame.from_records(posted_node, columns=['username', 'num_followers', 'tweets_id'])
df_user_hashtag = pd.merge(posted_node_df, belongs_node_df, on='tweets_id', how='outer').sort_values('username')
df_user_hashtag['p'] = None

for i in range(len(df_user_hashtag)):
    df_user_hashtag['p'][i] = 3 * df_user_hashtag['num_followers'][i]\df_user_hashtag['sentiment_compound'][i]

There is an efficient way to make this operation for each row? Thanks a lot. :)


Solution

  • You should not iterate over the rows ... this breaks pretty much all the benefits you get from using pandas.

    df_user_hashtag['p'] = 3 * df_user_hashtag['num_followers'] / df_user_hashtag['sentiment_compound']