Search code examples
pandasdataframeconditional-statementsfinance

Creating a new column in pandas dataframe with selective values based on condition on another row


I have a dataframe (df) with 10 columns of Prices of company share prices with associated data. The index has many different dates, however there are multiple identical dates (and it is sorted by date). Additionally the important columns for this problem are df['Cond1'] and df['Cond2'] and df['Returns'].

Here is an example of the data with 3 columns for only 2 index values (1/21/2017 and 1/22/2017), in reality there are multiple dates with multiple variables etc etc.

            Name    Cond1   Cond2   Returns
1/21/2017   Apple       2   4   0.052450819
1/21/2017   Blackberry  6   5   0.423446578
1/21/2017   Microsoft   3   2   0.073850562
1/21/2017   IBM         1   1   0.966576931
1/21/2017   Ubisoft     5   7   0.371786953
1/21/2017   Next        4   3   0.58357725
1/21/2017   Marks and Spencer   2   7   0.466737922
1/21/2017   Alpha       4   3   0.291305661
1/21/2017   Right move  6   2   0.206502435
1/21/2017   Topsy       7   5   0.655331635
1/21/2017   Pizza hut   4   7   0.295723144
1/21/2017   Mcdonalds   3   4   0.338535647
1/22/2017   IBM         2   3   0.975326708
1/22/2017   Next        1   5   0.70893239
1/22/2017   Alpha       1   3   0.362154048
1/22/2017   Blackberry  6   2   0.664525792
1/22/2017   Apple       6   6   0.363531989

Now I want to create two columns ['Returns2'] and ['Returns3']

Returns 2 = new column in dataframe which only shows the returns for the date and 12 periods forward for that particular company if Cond1 < Cond2.

Returns 3 = new column in dataframe which only shows the returns for 1 day ahead and for 12 periods forward from that day for that particular company if Cond1

So ultimately I want to have 12 consecutive returns for 12 days for the company which satisfied Cond1


Solution

  • You can do an approach like this:

    df = df.set_index('Name', append=True).swaplevel().sort_index()
    df.loc[df.Cond1< df.Cond2, 'returns2'] = True
    df.returns2 = df.groupby(level=0).returns2.transform(lambda x: x.ffill(limit=12))
    df.returns2 = df.returns2.mask(df.returns2.notnull(), df.Returns)
    df.returns2
    Name                  
    Alpha       2017-01-21          NaN
                2017-01-22     0.362154
    Apple       2017-01-21    0.0524508
                2017-01-22     0.363532
    Blackberry  2017-01-21          NaN
                2017-01-22          NaN
    IBM         2017-01-21          NaN
                2017-01-22     0.975327
    Mcdonalds   2017-01-21     0.338536
    Microsoft   2017-01-21          NaN
    MnSpencer   2017-01-21     0.466738
    Next        2017-01-21          NaN
                2017-01-22     0.708932
    Pizzahut    2017-01-21     0.295723
    Rightmove   2017-01-21          NaN
    Topsy       2017-01-21          NaN
    Ubisoft     2017-01-21     0.371787
    Name: test, dtype: object