Search code examples
pythonrandom

Generate Birth date based on Hire Date and grade PYTHON


In my data I have hire dates of employees and their paygrades. Paygrades are divided in categories: ( 1 = Intern , 2 : Junior , 3 : Senior ...)

Based on this data , I'm trying to generate approximate Birth Dates for these employees. Taking in account that an employee would be at least 23 years old.

This is the function I developed :

def generate_birth_date(paygrade, hire_date_str):
    if isinstance(hire_date_str, float) and math.isnan(hire_date_str):
        # Handle the case when hire_date_str is NaN
        return None
    if isinstance(hire_date_str, float):
        hire_date_str = str(int(hire_date_str))

    hire_date = datetime.strptime(hire_date_str, "%y-%m-%d").date()
    if paygrade == 'Intern':
        birth_year = random.randint(1998, 2000)
    elif paygrade == 'Junior':
        birth_year = random.randint(1996, 1998)
    elif paygrade == 'Senior':
        birth_year = random.randint(1994, 1996)
    elif paygrade == 'Manager':
        birth_year = random.randint(1992, 1994)
    elif paygrade == 'Senior Manager':
        birth_year = random.randint(1990, 1992)
    elif paygrade == 'Director':
        birth_year = random.randint(1988, 1990)
    else:
        birth_year = random.randint(1982, 1984)

    birth_month = random.randint(1, 12)
    birth_day = random.randint(1, 28)  # Assuming maximum of 28 days in a month

    birth_date = datetime(birth_year, birth_month, birth_day)

    return birth_date.date()

And this is how i'm calling it:

# Apply the function to the PAY_GRADE and HIRE_DATE columns to generate birth dates
df['BIRTH_DATE'] = df.apply(lambda row: generate_birth_date(row['PAY_GRADE'], row['HIRE_DATE']), axis=1)

The results are not 100% accurate, because II feel like sometimes he takes in account only the paygrade and sometimes the hire date only. For instance , an employee may be hired in 2006 with paygrade 2 , meaning he's a junior, meaning he was at least 23 years old by that age. Which means he would've at least almost 40 years old by now. How can I correct my function to retrieve ideal results ?


Solution

  • PROBLEM

    I think the issue is that it is generating birth dates only based on paygrades without considering the hire dates.

    Solution

    For that (to increase accuracy), you need to include the hire dates into the calculation of the birth dates.

    By using this approach i.e, by calculating the minimum birth year based on the hire date and the minimum age requirement of 23 year. So, minimum birth year is calculated by subtracting 23 from the year of hiring (hire date's year).Then, when generating the random birth year for each paygrade, the range is determined by considering the minimum birth date to the maximum allowed birth year .

    I hope this modification will ensures that the birth dates are at least 23 years before the hire date while still considering the paygrade categories.

    EXAMPLE

    Suppose we have an employee with the following information:

    • Paygrade: Junior
    • Hire Date: 2006-07-15

    By using the updated function, we can calculate the minimum birth date by subtracting 23 years from the hire date:

    hire_date = datetime.datetime.strptime("2006-07-15", "%Y-%m-%d").date()
    min_birth_date = hire_date - datetime.timedelta(days=(23 * 365))
    

    The min_birth_date would be 1983-07-15, which indicates that the employee should be at least 23 years old by their hire date. Next, we generate the random birth year within the range of min_birth_date.year to the maximum allowed birth year for the Junior paygrade, which is 1998:

    birth_year = random.randint(min_birth_date.year, 1998)
    

    Suppose the generated birth year is 1987. Finally, we generate random values for the birth month and birth day:

    birth_month = random.randint(1, 12)
    birth_day = random.randint(1, 28)
    

    Let's say the generated birth month is 11 and the birth day is 22.

    Combining the birth year, birth month, and birth day, the generated birth date for this employee would be 1987-11-22.

    In this way, this will ensure that the employee's birth date falls within a range that satisfies the minimum age requirement of 23 years at the time of hire, while also considering the paygrade.

    By applying this logic to all employees in your dataset, you can generate approximate birth dates that take into account both paygrades and (date of hiring)hire dates.