I am using SciPy in Python and the following return a nan
value for whatever reason:
>>>stats.ttest_ind([1, 1], [1, 1])
Ttest_indResult(statistic=nan, pvalue=nan)
>>>stats.ttest_ind([1, 1], [1, 1, 1])
Ttest_indResult(statistic=nan, pvalue=nan).
But whenever I use samples that have different summary statistics, I actually get a reasonable value:
stats.ttest_ind([1, 1], [1, 1, 1, 2])
Ttest_indResult(statistic=-0.66666666666666663, pvalue=0.54146973927558495).
Is it reasonable to interpret a p-value of nan
as 0
instead? Is there a reason from statistics that it doesn't make sense to run a 2-sample t-test on samples with the same summary statistics?
Division by zero will raise the NaN (= not a number) exception, or return a floating-point representation that, by convention, matches NaN. Be particularly careful of divide-by-N versus divide-by-N-minus-one standard deviation formulae.