Search code examples
pythonscipynanhypothesis-test

Why does SciPy return `nan` for a t-test with samples with 0 variance?


I am using SciPy in Python and the following return a nan value for whatever reason:

>>>stats.ttest_ind([1, 1], [1, 1])
Ttest_indResult(statistic=nan, pvalue=nan)

>>>stats.ttest_ind([1, 1], [1, 1, 1])
Ttest_indResult(statistic=nan, pvalue=nan).

But whenever I use samples that have different summary statistics, I actually get a reasonable value:

stats.ttest_ind([1, 1], [1, 1, 1, 2])
Ttest_indResult(statistic=-0.66666666666666663, pvalue=0.54146973927558495).

Is it reasonable to interpret a p-value of nan as 0 instead? Is there a reason from statistics that it doesn't make sense to run a 2-sample t-test on samples with the same summary statistics?


Solution

  • Division by zero will raise the NaN (= not a number) exception, or return a floating-point representation that, by convention, matches NaN. Be particularly careful of divide-by-N versus divide-by-N-minus-one standard deviation formulae.