I have the below data frame
ipdb> csv_data
country_edited sale_edited date_edited transformation_edited
0 India 403171 20090101 10
1 Bhutan 394096 20090101 20
2 Nepal Set Null 20090101 30
3 madhya 355883 20090101 40
4 sudan Set Null 20090101 50
I want to replace all the column values that contain Set Null
to Nan
and so i approached below way
import numpy
def set_NaN(element):
if element == 'Set Null':
return numpy.nan
else:
return element
csv_data = csv_data.applymap(lambda element: set_NaN(element))
But it does not changes anything
ipdb> print csv_data
country_edited sale_edited date_edited transformation_edited
0 India 403171 20090101 10
1 Bhutan 394096 20090101 20
2 Nepal Set Null 20090101 30
3 madhya 355883 20090101 40
4 sudan Set Null 20090101 50
ipdb>
But when i print only csv_data.applymap(lambda element: set_NaN(element))
as below i can see the output, but when assigned back i can't get the data i intended to
ipdb> csv_data.applymap(lambda element: set_NaN(element))
country_edited sale_edited date_edited transformation_edited
0 India 403171 20090101 10
1 Bhutan 394096 20090101 20
2 Nepal NaN 20090101 30
3 madhya 355883 20090101 40
4 sudan NaN 20090101 50
So how to replace the column values with NaN based on certain string ?
You need DataFrame.mask
, it replace True
values of mask by NaN
. Also some columns are numeric, so need cast values of df
to string
first:
print (csv_data.astype(str) == 'Set Null')
country_edited sale_edited date_edited transformation_edited
0 False False False False
1 False False False False
2 False True False False
3 False False False False
4 False True False False
csv_data = csv_data.mask(csv_data.astype(str) == 'Set Null')
print (csv_data)
country_edited sale_edited date_edited transformation_edited
0 India 403171 20090101 10
1 Bhutan 394096 20090101 20
2 Nepal NaN 20090101 30
3 madhya 355883 20090101 40
4 sudan NaN 20090101 50
Another solution with numpy boolean mask
- compare numpy array by DataFrame.values
:
print (csv_data.values == 'Set Null')
[[False False False False]
[False False False False]
[False True False False]
[False False False False]
[False True False False]]
csv_data = csv_data.mask(csv_data.values == 'Set Null')
print (csv_data)
country_edited sale_edited date_edited transformation_edited
0 India 403171 20090101 10
1 Bhutan 394096 20090101 20
2 Nepal NaN 20090101 30
3 madhya 355883 20090101 40
4 sudan NaN 20090101 50
In your solution is necessary assign data back to csv_data
:
def set_NaN(element):
if element == 'Set Null':
return numpy.nan
else:
return element
csv_data = csv_data.applymap(lambda element: set_NaN(element))
print (csv_data)
country_edited sale_edited date_edited transformation_edited
0 India 403171 20090101 10
1 Bhutan 394096 20090101 20
2 Nepal NaN 20090101 30
3 madhya 355883 20090101 40
4 sudan NaN 20090101 50