Search code examples
pythonpandasp-valuet-test

How do I perform a T-test from a dataframe?


I want to do a t-test for the means of hourly wages of male and female staff.

`df1 = df[["gender","hourly_wage"]] #creating a sub-dataframe with only the columns of gender and hourly wage
staff_wages=df1.groupby(['gender']).mean() #grouping the data frame by gender and assigning it to a new variable 'staff_wages'
staff_wages.head()`

Truth is, I think I've got confused half way. I wanted to do a t-test so I wrote the code

`mean_val_salary_female = df1[staff_wages['gender'] == 'female'].mean()
mean_val_salary_female = df1[staff_wages['gender'] == 'male'].mean()

t_val, p_val = stats.ttest_ind(mean_val_salary_female, mean_val_salary_male)

# obtain a one-tail p-value
p_val /= 2

print(f"t-value: {t_val}, p-value: {p_val}")`

It will only return errors.

I sort of went crazy trying different things...

`#married_vs_dependents = df[['married', 'num_dependents', 'years_in_employment']]


#married_vs_dependents = df[['married', 'num_dependents', 'years_in_employment']]
#married_vs_dependents.head()

#my_data = df(married_vs_dependents)
#my_data.groupby('married').mean()

mean_gender = df.groupby("gender")["hourly_wage"].mean()
married_vs_dependents.head()

mean_gender.groupby('gender').mean()

mean_val_salary_female = df[staff_wages['gender'] == 'female'].mean()
mean_val_salary_female = df[staff_wages['gender'] == 'male'].mean()

#cat1 = mean_gender['male']==['cat1']
#cat2 = mean_gender['female']==['cat2']

ttest_ind(cat1['gender'], cat2['hourly_wage'])`

Please who can guide me to the right step to take?


Solution

  • You're passing mean values of each group as a and b parameters - that's why the error raises. Instead, you should pass arrays, as it is stated in the documentation.


    df1 = df[["gender","hourly_wage"]]
    
    m = df1.loc[df1["gender"].eq("male")]["hourly_wage"].to_numpy()
    f = df1.loc[df1["gender"].eq("female")]["hourly_wage"].to_numpy()
    
    stats.ttest_ind(m,f)