Search code examples
pythonpandasdataframetimedelta

Change values of a timedelta column based on the previous row


Let it be the following Python Panda Dataframe:

code visit_time flag other counter
0 NaT True X 3
0 1 days 03:00:12 False Y 1
0 NaT False X 3
0 0 days 05:00:00 True X 2
1 NaT False Z 3
1 NaT True X 3
1 1 days 03:00:12 False Y 1
2 NaT True X 3
2 5 days 10:01:12 True Y 0

To solve the problem, only the columns: code, visit_time and flag are needed.

Each row with a value of visit_time, has a previous row with value NaT. Knowing this, I want to do next modification in the dataframe:

  • Sets the flag of the row with non-null value of visit_time to the same value as its previous row.

Example:

code visit_time flag other counter
0 NaT True X 3
0 1 days 03:00:12 True Y 1
0 NaT False X 3
0 0 days 05:00:00 False X 2
1 NaT False Z 3
1 NaT True X 3
1 1 days 03:00:12 True Y 1
2 NaT True X 3
2 5 days 10:01:12 True Y 0

I am grateful for the help offered in advance.


Solution

  • You can use .mask to set the 'flag' values to the .shifted version of itself where 'visit_time' values are notnull.

    out = df.assign(
        flag=df['flag'].mask(df['visit_time'].notnull(), df['flag'].shift())
    )
    
    print(out)
       code      visit_time   flag other  counter
    0     0             NaT   True     X        3
    1     0 1 days 03:00:12   True     Y        1
    2     0             NaT  False     X        3
    3     0 0 days 05:00:00  False     X        2
    4     1             NaT  False     Z        3
    5     1             NaT   True     X        3
    6     1 1 days 03:00:12   True     Y        1
    7     2             NaT   True     X        3
    8     2 5 days 10:01:12   True     Y        0
    
    • .mask(condition, other) replaces values where condition is True with the values of other in this case other is the value from the previous row.
    • .assign(…) is a way to update a column while returning a new DataFrame this can be replaced with column assignment df['flag'] = df['flag'].where(…) to modify the DataFrame in place.

    Creating a column from a string variable.

    df[name] = df[name].mask(df['visit_time'].notnull(), df[name].shift()))