Search code examples
pythonpandasdataframediff

Inserting row difference values in original DataFrame


I have a read.csv DataFrame which is continuously updating with addition of one new row on every run of my script which looks like ....

df = pd.read_csv(file_path)
print(df.to_string(index=False))

timestamp    Puts   Calls  PutCh  CallCh  ChDiff

09:41:12 AM 2027891 1820724 280101  200974   79127
09:48:51 AM 2075976 1862053 328186  242303   85883
09:58:48 AM 2091487 1885842 343697  266092   77605
10:08:21 AM 2091879 1918592 344089  298842   45247
02:26:00 PM 1995234 1941917 247444  322167  -74723
02:44:36 PM 1990071 1934874 242281  315124  -72843
02:56:17 PM 1970892 1938472 223102  318722  -95620

Now I want the difference of each succeeding row from previous one for which I have read about df.diff(). So I dropped the timestamp column to get a new datafame as df1 and wrote my script ...

df1.diff()

and got my output as ....

    Puts   Calls    PutCh  CallCh    ChDiff
     NaN     NaN      NaN     NaN       NaN
 48085.0 41329.0  48085.0 41329.0    6756.0
 15511.0 23789.0  15511.0 23789.0   -8278.0
   392.0 32750.0    392.0 32750.0  -32358.0
-96645.0 23325.0 -96645.0 23325.0 -119970.0
 -5163.0 -7043.0  -5163.0 -7043.0    1880.0
-19179.0  3598.0 -19179.0  3598.0  -22777.0

Here I want these difference values are added to my original DataFrame(df) in bracket for every column. More elaborately my output should looks like(here timestamp column also should be there as in my df)....

Puts    Calls   PutCh   CallCh  ChDiff
2027891 1820724 280101  200974  79127
2075976 1862053 328186  242303  85883
(48085) (41329) (48085) (41329) (6756)
2091487 1885842 343697  266092  77605
(15511) (23789) (15511) (23789) (-8278)
2091879 1918592 344089  298842  45247
(392)   (32750) (392)   (32750) (-32358)

Is there any way to do same plz ??


Solution

  • You can convert the difference as string to add '( )' then concat your dataframes before sorting the result:

    df1 = '(' + df.iloc[:, 1:].diff().dropna(how='all').astype(int).astype(str) + ')'
    out = pd.concat([df, df1]).fillna('').sort_index(kind='stable')
    out.index = np.where(out.index.duplicated(), '', out.index)
    

    Output:

    >>> out
         timestamp      Puts    Calls     PutCh   CallCh     ChDiff
    0  09:41:12 AM   2027891  1820724    280101   200974      79127
    1  09:48:51 AM   2075976  1862053    328186   242303      85883
                     (48085)  (41329)   (48085)  (41329)     (6756)
    2  09:58:48 AM   2091487  1885842    343697   266092      77605
                     (15511)  (23789)   (15511)  (23789)    (-8278)
    3  10:08:21 AM   2091879  1918592    344089   298842      45247
                       (392)  (32750)     (392)  (32750)   (-32358)
    4  02:26:00 PM   1995234  1941917    247444   322167     -74723
                    (-96645)  (23325)  (-96645)  (23325)  (-119970)
    5  02:44:36 PM   1990071  1934874    242281   315124     -72843
                     (-5163)  (-7043)   (-5163)  (-7043)     (1880)
    6  02:56:17 PM   1970892  1938472    223102   318722     -95620
                    (-19179)   (3598)  (-19179)   (3598)   (-22777)