Search code examples
pythonscipystatisticsconfidence-interval

Scipy.stats Confidence Intervals for T-distribution are different than when calculated by hand


I’m trying to find the 95 percent confidence interval of the mean using the given array. The problem is whenever I try using the interval method from stats.t, it gives me a different result than my hand calculated confidence interval. Could I have been inadvertently using the interval method incorrectly?

I’ve attached the code which I used below.

# find a 95 percent confidence interval for the mean weight of a popcorn bag
data = np.array([91, 101, 98, 98, 103, 97, 102, 105, 94, 90])

sMu = np.mean(data)
sSigma = np.std(data)
sem = stats.sem(data)
n = len(data)
df = n - 1

dist = stats.t(df)
critical_value = dist.ppf(0.975)

print(dist.interval(0.975, loc=sMu, scale=sem))
print(stats.t.interval(0.975, df=9, loc=sMu, scale=sem))

upper = sMu + (sem * critical_value)
lower = sMu - (sem * critical_value)
print('Manually Calculated: ', upper,' ', lower)

Solution

  • The ppf method (percent point function) of the t-distribution is used to find the critical value corresponding to a given level of confidence. In this case, you want to find the critical value for a 95% confidence interval, which corresponds to 0.975 in the t-distribution. The reason for using 0.975 is that the critical value is associated with the upper tail of the t-distribution, and you need to find the value that leaves 2.5% of the probability in the tail (to get the middle 95% confidence interval).

    But, the interval method is used to calculate a confidence interval directly from the distribution object. So, you have to pass 0.95 (instead of 0.975) as the first argument to get a 95% confidence interval. The method internally takes care of finding the critical values corresponding to the given level of confidence. With that being said, simply replace the value of 0.975 with 0.95 in the following two lines:

    print(dist.interval(0.95, loc=sMu, scale=sem))
    print(stats.t.interval(0.95, df=9, loc=sMu, scale=sem))