Search code examples
pythonpandasdata-analysis

How to select rows and assign value based on condition?


I have a dataframe like this:

 Id   row      Date       BuyPrice    SellPrice    Time
 1    1      2017-10-30    94520       0          9:00:00
 1    2      2017-10-30    94538       0          9:00:00
 1    3      2017-10-30    94609       0          9:00:00
 1    4      2017-10-30    94615       0          9:00:00
 1    5      2017-10-30    94617       0          9:00:00
 1    1      2017-09-20    99100       99159      9:00:10
 1    2      2017-09-20    99102       99058      9:00:11
 1    3      2017-09-20    99103       99057      9:00:12
 1    4      2017-09-20    99104       99056      9:00:10
 1    5      2017-09-20    99105       99055      9:00:10
 1    1      2017-09-20    98100       99190      9:01:10
 1    2      2017-09-20    98099       99091      9:01:10
 1    3      2017-09-20    98098       99092      9:01:10
 1    4      2017-09-20    98097       99093      9:01:10
 1    5      2017-09-20    98096       99094      9:01:10
 2    1      2010-11-01    99890       100000     10:00:02
 2    2      2010-11-01    99899       100000     10:00:02
 2    3      2010-11-01    99901       99899      9:00:02
 2    4      2010-11-01    99920       99850      10:00:02
 2    5      2010-11-01    99933       99848      10:00:23

I want to calculate SellPrice - BuyPrice for rows where the column "row" equals 1 (for each day and each id separately). If BuyPrice or SellPrice is 0, then NaN should be assigned.

The target output should look like this:

Id  row      Date         BuyPrice   SellPrice   Spread
1   1        2017-10-30    94520        0         NaN
1   1        2017-09-20    99100      99159       59
1   1        2017-09-20    98100      99190       90
2   1        2010-11-01    99890      100000      110

Here is the code I've tried so far:

 df1 = df.groupby(['SID','Date'], sort=False) 
 df1['Spread'] =np.where((df['row']==1).eq(0).any(1),np.nan,df['SellPrice']-df['BuyPrice'])

but I'm getting this error:

 ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'>

Solution

  • If you have row column as you've show here. You don't need groupby, simply use query to filter the rows as @cmaher suggest.

    df.query('row == 1').assign(Spread = 
                                 df['SellPrice'].mask(df['SellPrice'].eq(0)) - 
                                 df['BuyPrice'])
    

    Output:

        Id  row        Date  BuyPrice  SellPrice      Time  Spread
    0    1    1  2017-10-30     94520          0   9:00:00     NaN
    5    1    1  2017-09-20     99100      99159   9:00:10    59.0
    10   1    1  2017-09-20     98100      99190   9:01:10  1090.0
    15   2    1  2010-11-01     99890     100000  10:00:02   110.0