python statistics data-science hypothesis-test

t testing total plays of users by gender - python

I want to assess the statistical difference of male and female users group by each of their total plays (see below example):

Example of female entries

female

    users   artist  plays   gender  age
0   48591   sting   12763   f       25.0
1   48591   stars   8192    f       25.0

Sum plays per unique female user

female_user_plays = female.groupby('users').plays.sum()

female_user_plays

users
5         5479
6         3782
7         7521
11        7160

Example of male entries

female
    users   artist         plays    gender  age
51  56496   iron maiden    456      m       28.0
52  56496   elle           407      m       28.0

Sum plays per unique male user

male_user_plays = male.groupby('users').plays.sum()
male_user_plays

users
0         3282
1        25329
2        51522
3         1590

Average plays per gender

Average Total Male Plays: 11880
Average Total Female Plays: 13104

Before trying the t test, I converted each Series into value lists:

female_plays_list = female_user_plays.values.tolist()
male_plays_list = male_user_plays.values.tolist()

And for the t test:

ttest_ind(female_plays_list, male_plays_list, equal_var=False)

The result is what's confused me since the outputs seem very off and I'm thinking it's not due to variance of the two sample sizes....

Ttest_indResult(statistic=-8.9617251652001002, pvalue=3.3195063228833119e-19)

Is there any reason outside of array length that could be causing this?

Solution

A test of two arrays of 100000000 values of random integers from 0-10000 gives the following result:

In []: try1 = np.random.randint(1, 10000, 100000000)

In []: try2 = np.random.randint(1, 10000, 100000000)

In []: ttest_ind(try1, try2, equal_var=False)
Out[]: Ttest_indResult(statistic=-0.67549204672468233, pvalue=0.49936320345035146)

and of unequal lengths gives the following:

In []: try1 = np.random.randint(1, 10000, 1000000)

In []: ttest_ind(try1, try2, equal_var=False)
Out[]: Ttest_indResult(statistic=-0.39754328321364363, pvalue=0.6909669583715552)

so unless there's an insight I overlooked in my test or your arrays are of greater length it must be something in specific values of the arrays.