Search code examples
pythonpython-3.xpandaspython-3.7

How can I add a new column to a data frame using a function that accesses data from both current and previous row?


I have a data frame with data for several days: The code

import pandas

[...]

daily_data_f = pandas.DataFrame(daily_data, columns = ['Day', 'Total TODO/TODOE count'])

print(daily_data_f)

generates following output:

          Day  Total TODO/TODOE count
0  2020-05-16                      35
1  2020-05-17                      35
2  2020-05-18                      35
3  2020-05-19                      35
4  2020-05-20                      35
..        ...                     ...
64 2020-07-18                      35
65 2020-07-19                      35
66 2020-07-20                      35
68 2020-07-21                     151

I want to calculate the difference between the values of Total TODO/TODOE count on two subsequent days. The value jumped from 35 on 2020-06-28 to 151 on 2020-07-21. The value I want to calculate for 2020-07-21 151-35=116.

This answer suggests this approach:

df['new_column_name'] = df.apply(lambda x: my_function(x['value_1'], x['value_2']), axis=1)

I would have to write something like this:

daily_data_f['First Derivative'] = daily_data_f.apply(lambda x:diff(daily_data_f['Total TODO/TODOE count'], <PREVIOUS_VALUE>), axis=1)

where <PREVIOUS_VALUE> is the value of 'Total TODO/TODOE count' from the previous row (day).

Question: How can write an expression for <PREVIOUS_VALUE> (value of 'Total TODO/TODOE count' from the previous row)?


Solution

  • You can use numpy.diff or pandas.DataFrame.diff as below, numpy approach should be slightly faster:

    numpy:

    import numpy as np
    df['diff'] = np.diff(df['Total TODO/TODOE count'], prepend=np.nan)
    

    pandas:

    import pandas as pd
    df['diff'] = df['Total TODO/TODOE count'].diff()
    

    Output:

    Day Total TODO/TODOE count  diff
    0   2020-05-16  35  NaN
    1   2020-05-17  35  0.0
    2   2020-05-18  35  0.0
    3   2020-05-19  35  0.0
    4   2020-05-20  35  0.0
    64  2020-07-18  35  0.0
    65  2020-07-19  35  0.0
    66  2020-07-20  35  0.0
    68  2020-07-21  151 116.0