I am using Kaggle's dataset (https://www.kaggle.com/datasets/claytonmiller/lbnl-automated-fault-detection-for-buildings-data)
I have A dataset with Int64, Float64, and datetime64[ns] datatypes; after using the pandas fillna method, however, all of my data type changes to object datatype.
Could anyone assist me with what I need to do to retain the original data types after the Pandas conversion?
The following is the code I use:
import pandas as pd
import datetime as dt
%matplotlib inline
df = pd.read_csv('RTU.csv')
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
If I do a df.dtypes
I can see the correct datatypes however, after the following lines of code, it changes to object datatype.
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
def fault_mapper_FD(faultDate):
if pd.Timestamp(2017, 8, 27, 0) <= faultDate <= pd.Timestamp(2017, 8, 28, 0):
return 0
if pd.Timestamp(2017, 8, 29, 0) <= faultDate <= pd.Timestamp(2017, 8, 29, 23, 59):
return 0
if pd.Timestamp(2017, 12, 1, 0) <= faultDate <= pd.Timestamp(2017, 12, 1, 23, 59):
return 0
if pd.Timestamp(2017, 12, 3, 0) <= faultDate <= pd.Timestamp(2017, 12, 3, 23, 59):
return 0
if pd.Timestamp(2017, 12, 7, 0) <= faultDate <= pd.Timestamp(2017, 12, 8, 0):
return 0
if pd.Timestamp(2017, 12, 14, 0) <= faultDate <= pd.Timestamp(2017, 12, 14, 23, 59):
return 0
if pd.Timestamp(2018, 2, 7, 0) <= faultDate <= pd.Timestamp(2018, 2, 7, 23, 59):
return 0
if pd.Timestamp(2018, 2, 9, 0) <= faultDate <= pd.Timestamp(2018, 2, 9, 23, 59):
return 0
if pd.Timestamp(2017, 12, 20, 0) <= faultDate <= pd.Timestamp(2017, 12, 20, 23, 59):
return 0
if pd.Timestamp(2018, 2, 18, 0) <= faultDate <= pd.Timestamp(2018, 2, 18, 23, 59):
return 0
if pd.Timestamp(2018, 2, 1, 0) <= faultDate <= pd.Timestamp(2018, 2, 1, 23, 59):
return 0
if pd.Timestamp(2018, 1, 31, 0) <= faultDate <= pd.Timestamp(2018, 1, 31, 23, 59):
return 0
if pd.Timestamp(2018, 1, 28, 0) <= faultDate <= pd.Timestamp(2018, 1, 28, 23, 59):
return 0
if pd.Timestamp(2018, 1, 27, 0) <= faultDate <= pd.Timestamp(2018, 1, 27, 23, 59):
return 0
if (pd.Timestamp(2017, 9, 1, 0) <= faultDate <= pd.Timestamp(2017, 9, 1, 23, 59) or
pd.Timestamp(2017, 11, 30, 0) <= faultDate <= pd.Timestamp(2017, 11, 30, 23, 59) or
pd.Timestamp(2017, 12, 9, 0) <= faultDate <= pd.Timestamp(2017, 12, 9, 23, 59) or
pd.Timestamp(2017, 12, 10, 0) <= faultDate <= pd.Timestamp(2017, 12, 11, 0) or
pd.Timestamp(2017, 12, 24, 0) <= faultDate <= pd.Timestamp(2017, 12, 24, 23, 59) or
pd.Timestamp(2018, 2, 4, 0) <= faultDate <= pd.Timestamp(2018, 2, 4, 23, 59) or
pd.Timestamp(2018, 2, 5, 0) <= faultDate <= pd.Timestamp(2018, 2, 6, 0)):
return 1
df['FD'] = df['Timestamp'].apply(lambda fault_date: fault_mapper_FD(fault_date))
cond = (df.Timestamp.dt.time > dt.time(22,0)) | ((df.Timestamp.dt.time < dt.time(7,0)))
df[cond] = df[cond].fillna(0,axis=1)
Now the df.dtypes
gives all of my columns as objects/
I think you have a small typo. You just need to call
df = df[cond].fillna(0,axis=0)
which indeed doesn't change datatypes
Timestamp datetime64[ns]
RTU: Supply Air Temperature float64
RTU: Return Air Temperature float64
RTU: Supply Air Fan Status int64
RTU: Circuit 1 Discharge Temperature float64
...
VAV Box: Room 203 Air Temperature float64
VAV Box: Room 204 Air Temperature float64
VAV Box: Room 205 Air Temperature float64
VAV Box: Room 206 Air Temperature float64
Fault Detection Ground Truth int64
Length: 69, dtype: object