Search code examples
pythonrecode

Create a new variable from existing variables efficiently in python


I am trying to recode variables. I have been able to do this with map, however, I am trying to figure out an efficent way to turn recode multiple values (a, b, c) into a single value. In my example below, I have three different classifications for Asian and would like to recode them accordingly.. I tried using booleans but I get the below error.

df['Race'] = df['Race'].map({ 
    'Black or African American' : 'Black', 
    'White' : 'White', 
    'Hispanic or Latino': 'Non-White Hispanic', 
    ('Asian' | 'Asian/Indian/Pacific Islander' | 'Native Hawaiian or Other Pacific Islander') : 'Asian/Pacific Islander', 
    ('American Indian or Alaska Native' | 'Other/Mixed') : 'Multiracial/other', 
    'Unspecified' : np.nan
})

TypeError: unsupported operand type(s) for |: 'str' and 'str'

Is there an easier yet still efficient way of recoding multiple variables to a single value? It does not have to be map, that is just what I was most familiar with.


Solution

  • How about using Dictionary comprehension and unpacking:

    df['Race'] = df['Race'].map({ 
        'Black or African American' : 'Black', 
        'White' : 'White', 
        'Hispanic or Latino': 'Non-White Hispanic', 
        **{i: 'Asian/Pacific Islander' for i in ('Asian', 'Asian/Indian/Pacific Islander', 'Native Hawaiian or Other Pacific Islander')}, 
        **{i: 'Multiracial/other' for i in ('American Indian or Alaska Native', 'Other/Mixed')}, 
        'Unspecified' : np.nan
    })