Search code examples
pythonpandasdata-analysis

How to calculate quoted spread in pandas for each day?


I have a data of bond market like this:

Id   row      Date       BuyPrice    SellPrice    Time
1    1      2017-10-30    94520       0          9:00:00
1    2      2017-10-30    94538       0          9:00:00
1    3      2017-10-30    94609       0          9:00:00
1    4      2017-10-30    94615       0          9:00:00
1    5      2017-10-30    94617       0          9:00:00
1    1      2017-09-20    99100       99159      9:00:10
1    2      2017-09-20    99102       99058      9:00:11
1    3      2017-09-20    99103       99057      9:00:12
1    4      2017-09-20    99104       99056      9:00:10
1    5      2017-09-20    99105       99055      9:00:10
1    1      2017-09-20    98100       99190      9:01:10
1    2      2017-09-20    98099       99091      9:01:10
1    3      2017-09-20    98098       99092      9:01:10
1    4      2017-09-20    98097       99093      9:01:10
1    5      2017-09-20    98096       99094      9:01:10
2    1      2010-11-01    99890       100000     10:00:02
2    2      2010-11-01    99899       100000     10:00:02
2    3      2010-11-01    99901       99899      9:00:02
2    4      2010-11-01    99920       99850      10:00:02
2    5      2010-11-01    99933       99848      10:00:23

Step 1:

I want to calculate the spread(=SellPrice - BuyPrice) for row number one for each id for each day and exclude zeros if there is a zero in BuyPrice or SellPrice(Report nan for this kind of data), data in this step should be like this:

id     row      Date         BuyPrice      SellPrice     Spread
1      1        2017-10-30   94520         0             NaN
1      1        2017-09-20   99100         99159         59
1      1        2017-09-20   98100         99190         190
2      1        2010-11-01   99890         100000        110

Step 2:

Now i want to calculate average of Spread for each day for each id and give index respect to date

At last data should be like this:

Id    Date        avg.spread(average of spread for each day)   index
1     2017-10-30   NaN                                           1
1     2017-09-20   124.5(=(59+190)/2)                            2
2     2010-11-01   110                                           1

Solution

  • I've tried my best to understand what you want, although you haven't explicitly mentioned it, I think you want to groupby on Id, row, and date.

    g = df.assign(diff=df.SellPrice.sub(df.BuyPrice))\
                     .groupby(['Id', 'row', 'Date']).diff.mean()
    
    v = g.groupby(level=[0, 1]).cumcount().add(1).values
    df = g.reset_index().assign(index=v)
    
    df
    
        Id  row        Date   diff  index
    0    1    1  2017-09-20  574.5      1
    1    1    1  2017-10-30    NaN      2
    2    1    2  2017-09-20  474.0      1
    3    1    2  2017-10-30    NaN      2
    4    1    3  2017-09-20  474.0      1
    5    1    3  2017-10-30    NaN      2
    6    1    4  2017-09-20  474.0      1
    7    1    4  2017-10-30    NaN      2
    8    1    5  2017-09-20  474.0      1
    9    1    5  2017-10-30    NaN      2
    10   2    1  2010-11-01  110.0      1
    11   2    2  2010-11-01  101.0      1
    12   2    3  2010-11-01   -2.0      1
    13   2    4  2010-11-01  -70.0      1
    14   2    5  2010-11-01  -85.0      1