Search code examples
pythonpandasnumpyrowsnan

Keep the last n real values of uneven rows in a dataframe?


I am collecting heart rate values over the course of time. Each subject varies in the length of time that data was collected. I would like to make a table of the last 2 seconds of collected data.

import pandas as pd 
import numpy as np

#example data
example_s = [["4/20/21 4:20", 302, 0, 0, 1, 2, 3],
       ["2/17/21 9:20",135, 1, 1.4, 8, 10, np.NaN, np.NaN],
       ["2/17/21 9:20", 111, 5, 5,1, np.NaN, np.NaN,np.NaN, np.NaN]]
example_s_table = pd.DataFrame(example_s,columns=['Date_Time','CID', 0, 1, 2, 3, 4, 5, 6])


desired_outcome = [["4/20/21 4:20",302,1, 2, 3],
       ["2/17/21 9:20",135, 1.4, 8, 10 ],
       ["2/17/21 9:20",111, 5, 5,1 ]]

desired_outcome_table = pd.DataFrame(desired_outcome,columns=['Date_Time','CID', "Second 1", "Second 2", "Second 3"])

I can see how to collect a single instance of the data from the example shown here, but would like to know how to quickly add multiple values to my table:

desired_outcome_table["Last Second"]=example_s_table.iloc[:,1:].ffill(axis=1).iloc[:, -1]

Python Dataframe Get Value of Last Non Null Column for Each Row


Solution

  • Try:

    df = example_s_table.copy()
    df = df.set_index(['Date_Time', 'CID'])
    
    df_out = df.mask(df.eq(0))\
               .apply(lambda x: pd.Series(x.dropna().tail(3).values), axis=1)\
               .rename(columns = lambda x: f'Second {x+1}')
    df_out['Last Second'] = df_out['Second 3']
    print(df_out.reset_index())
    

    Output:

          Date_Time  CID  Second 1  Second 2  Second 3  Last Second
    0  4/20/21 4:20  302       1.0       2.0       3.0          3.0
    1  2/17/21 9:20  135       1.4       8.0      10.0         10.0
    2  2/17/21 9:20  111       5.0       5.0       1.0          1.0