Search code examples
pythonpandasdataframemissing-datadata-preprocessing

How to specify and locate cells using Pandas and use fillna


I am using a dataset that can be found on Kaggle website (https://www.kaggle.com/claytonmiller/lbnl-automated-fault-detection-for-buildings-data). I am trying to write a code that can specify based on Timestamp to look for those specific rows and apply a condition (In the context of this dataset the time between 10:01 PM to 6:59 AM) and fill all the columns corresponding to those specific rows with zero.

I have tried the following code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

%matplotlib inline
df = pd.read_csv('RTU.csv')

def fill_na(row):
    if dt.time(22, 1) <= pd.to_datetime(row['Timestamp']).time() <= dt.time(6, 59):
        row.fillna(0)
        
### df = df.apply(fill_na, axis=1) ###
df= df.apply(lambda row : fill_na(row), axis=1)
#### df.fillna(0, inplace=True) ###
df.head(2000)

However after changing the axis of the dataset it seems it can no longer work as intended.


Solution

  • I don't think you need a function to do that. Just filter the rows using a condition and then fillna.

    import datetime as dt
    import pandas as pd
    
    df = pd.read_csv('RTU.csv',parse_dates=['Timestamp'])
    df.head()
    
    cond = (df.Timestamp.dt.time > dt.time(22,0)) | ((df.Timestamp.dt.time < dt.time(7,0)))
    df[cond] = df[cond].fillna(0,axis=1)
    

    Shows that the na before 7am fill with 0

    enter image description here