I could always .fillna() after. But I'm trying to implement a value for "OTHER" as part of the recoding-dict. I thought a defaultdict might be a good fit, but it seems to behave like a generator, and pandas Series.replace() does not seem to generate results for keys not requested earlier in the code.
Example code:
import pandas as pd
from collections import defaultdict
recode = defaultdict(lambda:"Unknown", {
1 : "Yes",
2 : "No"
})
print("key 0:", recode[0]) # Will generate a key-value for the key "0"
df = pd.DataFrame(pd.Series([0,1,2,5]), columns = ["code"])
df['answer'] = df['code'].replace(recode)
print(df)
Will generate this output:
key 0: Unknown
code answer
0 0 Unknown
1 1 Yes
2 2 No
3 5 5
So since we called print() on recode[0] this gets generated, and can be used by pd.Series.replace(), but recode[5] is ONLY searched for by pd.Series.replace() and is therefore not replaced by "Unknown" like I expected.
Suggestions? (on how to include an "OTHER" within the recode-datastructure)
Accepted Answer
Building on Anurag Dabass answer, you can just use map...
recode = defaultdict(lambda:"Unknown", {
1 : "Yes",
2 : "No",
None: "Ah shit"
})
df['answer'] = df['code'].map(recode)
Output:
code answer
0 0 Unknown
1 1 Yes
2 2 No
3 5 Unknown
When you do:
print("key 0:", recode[0])
Since there is no key 0 exist in record so it will generate a key 0 with value 'Unknown' because you are not assigning any value while creating a 0 key in the defaultdict
so now recode becomes:
print(record)
defaultdict(<function __main__.<lambda>()>, {1: 'Yes', 2: 'No', 0: 'Unknown'})
so Now if you do:
df['answer'] = df['code'].replace(recode)
0 is replaced with 'Unknown' because there exist a value of 0 inside the defaultdict recode i.e 'Unknown' and there is no value of 5 exist in the default dict so it remained unchanged and you can checked that by:
print('keys: ',recode.keys(),'\nvalues: ',recode.values())
keys: dict_keys([1, 2, 0])
values: dict_values(['Yes', 'No', 'Unknown'])
Update:
you can use simple dictionary or defaultdict with map()
+fillna()
:
df['answer'] = df['code'].map({1:'Yes',2:'No'}).fillna('Other')
output of df
:
code answer
0 0 Other
1 1 Yes
2 2 No
3 5 Other