Search code examples
pandasnanxlsx

Pandas handling NaN values as 'nan' & 'NaN'


This is going to sound like a really stupid issue, but I'm importing file from excel, and noticed that pandas is handling values that are supposed to be NaN differently in VS Code. Here's a visual below...

City State
'Detroit' 'MI'
'Chicago' NaN
nan 'CA'

This is obviously a small snapshot, but notice that row 3 in the City column is listed as nan and row 2 in the State column is NaN.

The issue is that when I use df.isna() or df.isnull() it only picks up on the NaN value in the State column, but not the city column. Unfortunately I can't use keep_na_values = True in this situation because I need those values to be labeled as NaN so that a .isna() function can identify them later on. What's more odd is that the cells in the excel file are empty, and on top of that I don't seem to have this issue in Jupyter Notebook.

I've tried using

df.City.replace(to_replace = {'nan': 'NaN'},inplace = True) 

but no luck. I also came across this post below which suggests using pd.read_excel(na_values = ['nan']) however that doesn't allow the .isna() or .isnull() to pick up on the values.

https://stackoverflow.com/questions/50685107/pandas-dataframe-nan-values-not-replacing

Does anyone have any ideas on how I can get pandas to read all NaN values in this file in one manner rather than multiple? If it's not able to be done then is what it is.


Solution

  • Are you sure that those nan you are referencing are not simply a string of nan? If they are could do a replace and manually set them to np.nan using numpy

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({
        'Column_1' : ['A', 'nan', 'C'],
        'Column_2' : ['1', '2', np.nan]
    })
    
    df = df.replace('nan', np.nan)
    df.isna()