T-test on the means pandas

I'm woking with the Movielens dataset and I would like to do the t-test on the mean ratings value of the male and female users.

import pandas as pd
from scipy.stats import ttest_ind

users_table_names= ['user_id','age','gender','occupation','zip_code']
users= pd.read_csv('ml-100k/u.user', sep='|', names= users_table_names)
ratings_table_names= ['user_id', 'item_id','rating','timestamp']
ratings= pd.read_csv('ml-100k/u.data', sep='\t', names=ratings_table_names)
rating_df= pd.merge(users, ratings)

males = rating_df[rating_df['gender']=='M']
females = rating_df[rating_df['gender']=='F']

ttest_ind(males.rating, females.rating)

And I get the following result:

Ttest_indResult(statistic=-0.27246234775012407, pvalue=0.7852671011802962)

Is this the correct way to do this operation? The results seem a bit odd.

Thank you in advance!

Solution

With your code you are considering a two-sided ttest with the assumption that the populations have identical variances, once you haven't specified the parameter equal_var and by default it is True on the scypi ttest_ind().

So you can represent your statitical test as:

Null hypothesis (H0): there is no difference between the values recorded for male and females, or in other words, means are similar. (µMale == µFemale).
Alternative hypothesis (H1): there is a difference between the values recorded for male and females, or in other words, means are not similar (both the situations where µMale > µFemale and µMale < µFemale, or simply µMale != µFemale)

The significance level is an arbitrary definition on your test, such as 0.05. If you had obtained a small p-value, smaller than your significance level, you could disprove the null hypothesis (H0) and consequently prove the alternative hypothesis (H1).

In your results, the p-value is ~0.78, or you can't disprove the H0. So, you can assume that the means are equal.

Considering the standard deviations of sampes as below, you could eventually define your test as equal_var = False:

>> males.rating.std()
1.1095557786889139
>> females.rating.std()
1.1709514829100405

>> ttest_ind(males.rating, females.rating, equal_var = False)
Ttest_indResult(statistic=-0.2654398046364026, pvalue=0.7906719538136853)

Which also confirms that the null hypothesis (H0).

If you use the stats model ttest_ind(), you also get the degrees of freedon used in the t-test:

>> import statsmodels.api as sm
>> sm.stats.ttest_ind(males.rating, females.rating, alternative='two-sided', usevar='unequal')
(-0.2654398046364028, 0.790671953813685, 42815.86745494558)

What exactly you've found odd on your results?