Search code examples
pythonmachine-learningartificial-intelligencecategorical-datafeature-engineering

Feature Engineering Salary Data using Categorical Column as a condition


Need to convert salary amount to annualised salaries considering the Categorical Column :

  • 'M' - monthly
  • 'Y' - yearly
  • 'W' - weekly
  • 'B' - bi weekly
df = pd.DataFrame({'Name':['A','B','C','D','E'],
                  'sal_amt':[4500,50000,2000,3000,5000],
                  'sal_md':['M','Y','W','B','M']})
df.head()

#defined a function for my problem...

def func(row):
    if row['sal_md'] == 'M':
        return (row['sal_amt']*12)
    elif row['sal_md'] =='Y':
        return row['sal_amt'] 
    elif row['sal_md'] == 'H':
        return (row['sal_amt']*8760)
    elif row['sal_md'] == 'W':
        return (row['sal_amt']*52)
    elif row['sal_md'] == 'B':
        return (row['sal_amt']*26)
    elif row['sal_md'] == 'S':
        return row['sal_amt']
    elif row['sal_md'] == 'A':
        return row['sal_amt']


df['sal_annual'] = df.apply(func,axis=1)

https://i.sstatic.net/INXva.png


Solution

  • In [1]: import pandas as pd
    
    In [2]: df = pd.DataFrame({'Name':['A','B','C','D','E'],
                          'sal_amt':[4500,50000,2000,3000,5000],
                          'sal_md':['M','Y','W','B','M']})
    
    In [3]: multiplier_dict = {'M':12, 'Y':1, 'W':52, 'B':26}
    
    In [4]: df['sal_multiplier'] = df.sal_md.map(multiplier_dict)
    
    In [5]: df['sal_annual'] = df.sal_amt*df.sal_multiplier
    
    In [6]: df.head()
    Out[6]:
      Name  sal_amt sal_md  sal_multiplier  sal_annual
    0    A     4500      M              12       54000
    1    B    50000      Y               1       50000
    2    C     2000      W              52      104000
    3    D     3000      B              26       78000
    4    E     5000      M              12       60000
    

    Not exactly what you asked about, but exactly solves your problem in an easy and pythonic way.