Search code examples
pythonpandasnumpydataframekeyerror

Adding a new column based on other columns and rows


I have a large dataframe. Let me write a sample dataframe for let you understand my question.

A      B      C     
car    red    15
car    blue   20
car    grey   14
bike   red    6
bike   blue   8
phone  red    9
phone  blue   11
phone  grey   10

Let's say column C show the price. I want to add a column called "D". This columns will answer that "Is read car expensive than mean price of all cars?". And the same question for other A values. My question is basicly like that. I want to see this:

A      B      C    D    
car    red    15   cheap
car    blue   20   expensive
car    grey   14   cheap
bike   red    6    cheap
bike   blue   8    expensive
phone  red    9    cheap
phone  blue   11   expensive
phone  grey   10   cheap

I write too many way to do this task. Finally I thought that this code will solve my problem but it didn't. I tried the same thing with While loop but I am keep getting Key Error 0. What should I do? Here is the code I tried:

df["D"] = "cheap"
A.values = df.A.unique()
for b in A.values:
    for i in range(len(df.loc[data.A== b])):
        if df.loc[df.A== b, "C"][i] >= df.loc[df.A== b, "C"].mean():
            df.loc[df.A== b, "D"][i] = "expensive"

Solution

  • Check transform with mean, then do np.where

    s = df.groupby('A').C.transform('mean')
    df['D'] = np.where(df.C>s, 'expensive', 'cheap')
    df
    Out[158]: 
           A     B   C          D
    0    car   red  15      cheap
    1    car  blue  20  expensive
    2    car  grey  14      cheap
    3   bike   red   6      cheap
    4   bike  blue   8  expensive
    5  phone   red   9      cheap
    6  phone  blue  11  expensive
    7  phone  grey  10      cheap