Are there different string nan values?

I have a dataframe with two columns for which I compare the values. The rows for these different values and the values themselves are saved in a new dataframe.

Dataframe before comparing:

other columns	value_a	value_b	other columns
...	12	12	...
...	1.3	1.6	...
...	abc	def	...

Dataframe after comparing:

other columns	value_a	value_b	other columns
...	1.3	1.6	...
...	abc	def	...

The problem is that I also get the following lines:

other columns	value_a	value_b	other columns
...			...
...			...

Empty cells are compared with each other and reported as non-matching.

Now I have created a set for each of the columns value_a and value_b to see which values occur in the columns. I used the following code for this:

df2['non-numeric_a'] = df['value_a'].mask(df['value_a'].notna())

df2['non-numeric_b'] = df['value_b'].mask(df['value_b'].notna())

Then I looked at the columns as a set, because I wanted to see the unique values that occur for each column: print(set( df2['non-numeric_a'])) print(set( df2['non-numeric_b']))

My output for the sets was: {nan} and {nan, nan, nan, ..., nan}

Solution

A NaN is not equal to itself, thus set([float('nan'), float('nan')]) -> {nan, nan}

Rather dropna before converting to set:

set(df2['non-numeric_b'].dropna()))

Or:

set(df2['non-numeric_b'].unique())

Example:

s= pd.Series([float('nan'), float('nan'), 1, 1, 2, 3])

set(s)
# {nan, nan, 1.0, 2.0, 3.0}

set(s.dropna())
# {1.0, 2.0, 3.0}

set(s.unique())
# {nan, 1.0, 2.0, 3.0}