I'm trying to make a column called average_sales
, but the result is NaN
. When I put them in the varible a
, it works correctly.
Please let me know how can I solve this problem. Ignore Korean letters.
As Rawson said in his comment:
When you use groupby, the index will be different, so df and the new pd.Series do not align, causing the NaN values.
What this means in practice is that you cant simply re-add the column in the way you are trying to (df["average_sales"] = ...)
. Pandas does not know how to align these things since it is expecting to be able to match values in index of the series you are passing to the values in the index of the existing dataframe. That can't work for many reasons here; one being the fact that your dataframe has all integers in the index and the index of the grouped series is a bunch of strings corresponding to values in the column you grouped on.
What might be more helpful for your intuition for why this wouldn't work is that they are not the same size either. Your original dataframe is at least 21 entries long, but you only have 11 unique values in the column you grouped on (not counting nulls). When adding a new column this way we can usually expect to be passing in a series that has the same length as the number of rows that are in the dataframe.
What I think you are trying to do is define an average sales value for each of the types of entries in the groupby. If that is the case, you could do the following:
(Since you provided screenshots I can't copy the names of your columns, so I will be using "company" for the grouper variable and "sales" for the value you are summing on.)
Perform your initial grouping:
average_sales_df = df.groupby(["company"])[["sales"]].mean().round(2)
Note that I have placed double square brackets around "sales". This is so that a dataframe is returned and not a series
Now lets reset the index so that it is a usable column:
average_sales_df.reset_index(inplace=True)
Optional step to rename the column that we calculated the mean for:
average_sales_df.rename(columns={"sales":"average_sales"}, inplace=True)
Finally we can merge this dataframe back on your original one which will provide the same number for average sales for each of the values of "company" in the original df:
merged = df.merge(average_sales_df, on="Sales")
Let me know if what you see in merged
is what you wanted. Hope this helps!