Search code examples
pythonpandasreplacecollectionsdefaultdict

pandas Series.replace() not generating default value from defaultdict


I could always .fillna() after. But I'm trying to implement a value for "OTHER" as part of the recoding-dict. I thought a defaultdict might be a good fit, but it seems to behave like a generator, and pandas Series.replace() does not seem to generate results for keys not requested earlier in the code.

Example code:

import pandas as pd
from collections import defaultdict

recode = defaultdict(lambda:"Unknown", {
    1 : "Yes",
    2 : "No"
})

print("key 0:", recode[0]) # Will generate a key-value for the key "0"

df = pd.DataFrame(pd.Series([0,1,2,5]), columns = ["code"])
df['answer'] = df['code'].replace(recode)
print(df)

Will generate this output:

key 0: Unknown
   code   answer
0     0  Unknown
1     1      Yes
2     2       No
3     5        5

So since we called print() on recode[0] this gets generated, and can be used by pd.Series.replace(), but recode[5] is ONLY searched for by pd.Series.replace() and is therefore not replaced by "Unknown" like I expected.

Suggestions? (on how to include an "OTHER" within the recode-datastructure)

Accepted Answer

Building on Anurag Dabass answer, you can just use map...

recode = defaultdict(lambda:"Unknown", {
    1 : "Yes",
    2 : "No",
    None: "Ah shit"
})
df['answer'] = df['code'].map(recode)

Output:

    code    answer
0   0   Unknown
1   1   Yes
2   2   No
3   5   Unknown

Solution

  • When you do:

    print("key 0:", recode[0])
    

    Since there is no key 0 exist in record so it will generate a key 0 with value 'Unknown' because you are not assigning any value while creating a 0 key in the defaultdict

    so now recode becomes:

    print(record)
    defaultdict(<function __main__.<lambda>()>, {1: 'Yes', 2: 'No', 0: 'Unknown'})
    

    so Now if you do:

    df['answer'] = df['code'].replace(recode)
    

    0 is replaced with 'Unknown' because there exist a value of 0 inside the defaultdict recode i.e 'Unknown' and there is no value of 5 exist in the default dict so it remained unchanged and you can checked that by:

    print('keys: ',recode.keys(),'\nvalues: ',recode.values())
    
    keys:  dict_keys([1, 2, 0]) 
    values:  dict_values(['Yes', 'No', 'Unknown'])
    

    Update:

    you can use simple dictionary or defaultdict with map()+fillna():

    df['answer'] = df['code'].map({1:'Yes',2:'No'}).fillna('Other')
    

    output of df:

        code    answer
    0   0       Other
    1   1       Yes
    2   2       No
    3   5       Other