Search code examples
pythonpandasfillna

A dataset with Int64, Float64 and datetime64[ns] gets converted to object after applying Pandas fillna method


I am using Kaggle's dataset (https://www.kaggle.com/datasets/claytonmiller/lbnl-automated-fault-detection-for-buildings-data)

I have A dataset with Int64, Float64, and datetime64[ns] datatypes; after using the pandas fillna method, however, all of my data type changes to object datatype.

Could anyone assist me with what I need to do to retain the original data types after the Pandas conversion?

The following is the code I use:

import pandas as pd
import datetime as dt
%matplotlib inline
df = pd.read_csv('RTU.csv') 
df['Timestamp'] = pd.to_datetime(df['Timestamp'])

The data types initally

If I do a df.dtypes I can see the correct datatypes however, after the following lines of code, it changes to object datatype.

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
def fault_mapper_FD(faultDate):
    if pd.Timestamp(2017, 8, 27, 0) <= faultDate <= pd.Timestamp(2017, 8, 28, 0):
        return 0
    if pd.Timestamp(2017, 8, 29, 0) <= faultDate <= pd.Timestamp(2017, 8, 29, 23, 59):
        return 0
    if pd.Timestamp(2017, 12, 1, 0) <= faultDate <= pd.Timestamp(2017, 12, 1, 23, 59):
        return 0
    if pd.Timestamp(2017, 12, 3, 0) <= faultDate <= pd.Timestamp(2017, 12, 3, 23, 59):
        return 0
    if pd.Timestamp(2017, 12, 7, 0) <= faultDate <= pd.Timestamp(2017, 12, 8, 0):
        return 0
    if pd.Timestamp(2017, 12, 14, 0) <= faultDate <= pd.Timestamp(2017, 12, 14, 23, 59):
        return 0
    if pd.Timestamp(2018, 2, 7, 0) <= faultDate <= pd.Timestamp(2018, 2, 7, 23, 59):
        return 0
    if pd.Timestamp(2018, 2, 9, 0) <= faultDate <= pd.Timestamp(2018, 2, 9, 23, 59):
        return 0
    if pd.Timestamp(2017, 12, 20, 0) <= faultDate <= pd.Timestamp(2017, 12, 20, 23, 59):
        return 0
    if pd.Timestamp(2018, 2, 18, 0) <= faultDate <= pd.Timestamp(2018, 2, 18, 23, 59):
        return 0
    if pd.Timestamp(2018, 2, 1, 0) <= faultDate <= pd.Timestamp(2018, 2, 1, 23, 59):
        return 0
    if pd.Timestamp(2018, 1, 31, 0) <= faultDate <= pd.Timestamp(2018, 1, 31, 23, 59):
        return 0
    if pd.Timestamp(2018, 1, 28, 0) <= faultDate <= pd.Timestamp(2018, 1, 28, 23, 59):
        return 0
    if pd.Timestamp(2018, 1, 27, 0) <= faultDate <= pd.Timestamp(2018, 1, 27, 23, 59):
        return 0
    if (pd.Timestamp(2017, 9, 1, 0) <= faultDate <= pd.Timestamp(2017, 9, 1, 23, 59) or 
    pd.Timestamp(2017, 11, 30, 0) <= faultDate <= pd.Timestamp(2017, 11, 30, 23, 59) or 
    pd.Timestamp(2017, 12, 9, 0) <= faultDate <= pd.Timestamp(2017, 12, 9, 23, 59) or 
    pd.Timestamp(2017, 12, 10, 0) <= faultDate <= pd.Timestamp(2017, 12, 11, 0) or 
    pd.Timestamp(2017, 12, 24, 0) <= faultDate <= pd.Timestamp(2017, 12, 24, 23, 59) or 
    pd.Timestamp(2018, 2, 4, 0) <= faultDate <= pd.Timestamp(2018, 2, 4, 23, 59) or 
    pd.Timestamp(2018, 2, 5, 0) <= faultDate <= pd.Timestamp(2018, 2, 6, 0)):
        return 1

df['FD'] = df['Timestamp'].apply(lambda fault_date: fault_mapper_FD(fault_date))

cond = (df.Timestamp.dt.time > dt.time(22,0)) | ((df.Timestamp.dt.time < dt.time(7,0)))
df[cond] = df[cond].fillna(0,axis=1)

Now the df.dtypes gives all of my columns as objects/

The data types after the Pandas fillna methos


Solution

  • I think you have a small typo. You just need to call

    df = df[cond].fillna(0,axis=0)
    

    which indeed doesn't change datatypes

    Timestamp                               datetime64[ns]
    RTU: Supply Air Temperature                    float64
    RTU: Return Air Temperature                    float64
    RTU: Supply Air Fan Status                       int64
    RTU: Circuit 1 Discharge Temperature           float64
                                                 ...      
    VAV Box: Room 203 Air Temperature              float64
    VAV Box: Room 204 Air Temperature              float64
    VAV Box: Room 205 Air Temperature              float64
    VAV Box: Room 206 Air Temperature              float64
    Fault Detection Ground Truth                     int64
    Length: 69, dtype: object