Greeting everyone. I have an excel file that I need to clean and fill NaN values according to column data types, like if column data type is object I need to fill "NULL" in that column and if data types is integer or float 0 needs to be filled in those columns.
So far I have tried 2 method to do the job but no luck, here is the first
df = pd.read_excel("myExcel_files.xlsx")
df.select_dtypes(include='int64').fillna(0, inplace=True)
df.select_dtypes(include='float64').fillna(0.0, inplace=True)
df.select_dtypes(include='object').fillna("NULL", inplace=True)
and the output that I get is not an error but a warning and there is no change in data frame
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py:4259: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
**kwargs
df = pd.read_excel("myExcel_files.xlsx")
#get the list of all integer columns
int_cols = list(df.select_dtypes('int64').columns)
#get the list of all float columns
float_cols = list(df.select_dtypes('float64').columns)
#get the list of all object columns
object_cols = list(df.select_dtypes('object').columns)
#looping through if each column to fillna
for i in int_cols:
df[i].fillna(0,inplace=True)
for f in float_cols:
df[f].fillna(0,inplace=True)
for o in object_cols:
df[o].fillna("NULL",inplace=True)
Both of my methods doesn't work. Many thanks for any help or suggestions. Regards -Manish
I think that instead of using select_dtypes
and iterating over columns you can take the .dtypes
of your DF and replace float64's wth 0.0 and objects with "NULL"... you don't need to worry about int64's as they generally won't have missing values to fill (unless you're using pd.NA
or a nullable int type), so you might be able to do a single operation of:
df.fillna(df.dtypes.replace({'float64': 0.0, 'O': 'NULL'}), inplace=True)
You can also add downcast='infer'
so that if you have what can be int64
s in a float64
column, you end up with int64
s, eg given:
df = pd.DataFrame({
'a': [1.0, 2, np.nan, 4],
'b': [np.nan, 'hello', np.nan, 'blah'],
'c': [1.1, 1.2, 1.3, np.nan]
})
Then:
df.fillna(df.dtypes.replace({'float64': 0.0, 'O': 'NULL'}), downcast='infer', inplace=True)
Will give you (note column a
was downcast to int but c
remains float):
a b c
0 1 NULL 1.1
1 2 hello 1.2
2 0 NULL 1.3
3 4 blah 0.0