This is going to sound like a really stupid issue, but I'm importing file from excel, and noticed that pandas is handling values that are supposed to be NaN differently in VS Code. Here's a visual below...
City | State |
---|---|
'Detroit' | 'MI' |
'Chicago' | NaN |
nan | 'CA' |
This is obviously a small snapshot, but notice that row 3 in the City column is listed as nan and row 2 in the State column is NaN.
The issue is that when I use df.isna()
or df.isnull()
it only picks up on the NaN value in the State column, but not the city column. Unfortunately I can't use keep_na_values = True
in this situation because I need those values to be labeled as NaN so that a .isna()
function can identify them later on. What's more odd is that the cells in the excel file are empty, and on top of that I don't seem to have this issue in Jupyter Notebook.
I've tried using
df.City.replace(to_replace = {'nan': 'NaN'},inplace = True)
but no luck. I also came across this post below which suggests using pd.read_excel(na_values = ['nan'])
however that doesn't allow the .isna()
or .isnull() to pick up on the values.
https://stackoverflow.com/questions/50685107/pandas-dataframe-nan-values-not-replacing
Does anyone have any ideas on how I can get pandas to read all NaN values in this file in one manner rather than multiple? If it's not able to be done then is what it is.
Are you sure that those nan you are referencing are not simply a string of nan? If they are could do a replace and manually set them to np.nan using numpy
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Column_1' : ['A', 'nan', 'C'],
'Column_2' : ['1', '2', np.nan]
})
df = df.replace('nan', np.nan)
df.isna()