I am using a dataset that can be found on Kaggle website (https://www.kaggle.com/claytonmiller/lbnl-automated-fault-detection-for-buildings-data). I am trying to write a code that can specify based on Timestamp to look for those specific rows and apply a condition (In the context of this dataset the time between 10:01 PM to 6:59 AM) and fill all the columns corresponding to those specific rows with zero.
I have tried the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
%matplotlib inline
df = pd.read_csv('RTU.csv')
def fill_na(row):
if dt.time(22, 1) <= pd.to_datetime(row['Timestamp']).time() <= dt.time(6, 59):
row.fillna(0)
### df = df.apply(fill_na, axis=1) ###
df= df.apply(lambda row : fill_na(row), axis=1)
#### df.fillna(0, inplace=True) ###
df.head(2000)
However after changing the axis of the dataset it seems it can no longer work as intended.
I don't think you need a function to do that. Just filter the rows using a condition and then fillna.
import datetime as dt
import pandas as pd
df = pd.read_csv('RTU.csv',parse_dates=['Timestamp'])
df.head()
cond = (df.Timestamp.dt.time > dt.time(22,0)) | ((df.Timestamp.dt.time < dt.time(7,0)))
df[cond] = df[cond].fillna(0,axis=1)
Shows that the na before 7am fill with 0