I have a large dataframe. Let me write a sample dataframe for let you understand my question.
A B C
car red 15
car blue 20
car grey 14
bike red 6
bike blue 8
phone red 9
phone blue 11
phone grey 10
Let's say column C show the price. I want to add a column called "D". This columns will answer that "Is read car expensive than mean price of all cars?". And the same question for other A values. My question is basicly like that. I want to see this:
A B C D
car red 15 cheap
car blue 20 expensive
car grey 14 cheap
bike red 6 cheap
bike blue 8 expensive
phone red 9 cheap
phone blue 11 expensive
phone grey 10 cheap
I write too many way to do this task. Finally I thought that this code will solve my problem but it didn't. I tried the same thing with While loop but I am keep getting Key Error 0. What should I do? Here is the code I tried:
df["D"] = "cheap"
A.values = df.A.unique()
for b in A.values:
for i in range(len(df.loc[data.A== b])):
if df.loc[df.A== b, "C"][i] >= df.loc[df.A== b, "C"].mean():
df.loc[df.A== b, "D"][i] = "expensive"
Check transform
with mean
, then do np.where
s = df.groupby('A').C.transform('mean')
df['D'] = np.where(df.C>s, 'expensive', 'cheap')
df
Out[158]:
A B C D
0 car red 15 cheap
1 car blue 20 expensive
2 car grey 14 cheap
3 bike red 6 cheap
4 bike blue 8 expensive
5 phone red 9 cheap
6 phone blue 11 expensive
7 phone grey 10 cheap