I'm looking for a simple solution to perform multi-factor ANOVA analysis in python. A 2-factor nested ANOVA is what I'm after, and the SPM1D python module is one way to do that, however I am having an issue.
http://www.spm1d.org/doc/Stats1D/anova.html#two-way-nested-anova
for any of the nested approach examples, there is never any F-statistic or p_values printed, nor can I find any way to print them or send them to a variable.
To go through the motions of running one of their examples, where B is nested inside A, with Y observations:
import numpy as np
from matplotlib import pyplot
import spm1d
dataset = spm1d.data.uv1d.anova2nested.SPM1D_ANOVA2NESTED_3x3()
Y,A,B = dataset.get_data()
#(1) Conduct ANOVA:
alpha = 0.05
FF = spm1d.stats.anova2nested(Y, A, B, equal_var=True)
FFi = FF.inference(0.05)
print( FFi )
#(2) Plot results:
pyplot.close('all')
FFi.plot(plot_threshold_label=True, plot_p_values=True)
pyplot.show()
The only indication of statistical significance provided is whether the h0 hypothesis is rejected or not.
> print( FFi )
SPM{F} inference list
design : ANOVA2nested
nEffects : 2
Effects:
A z=(1x101) array df=(2, 6) h0reject=True
B z=(1x101) array df=(6, 36) h0reject=False
In reality, that should be enough. However, in science, scientists like to think of something as more or less significant, which is actually kind of crap... significance is binary. But that's how they think about it, so I have to play along in order to get work published.
The example code produces a matplotlib plot, and this DOES have the f statistic and p_values on it!
#(2) Plot results:
pyplot.close('all')
FFi.plot(plot_threshold_label=True, plot_p_values=True)
pyplot.show()
But I can't seem to get any output which prints it.
FFi.get_p_values
and
FFi.get_f_values
produce the output:
<bound method SPMFiList.get_p_values <kabammi edit -- or get_f_values> of SPM{F} inference list
design : ANOVA2nested
nEffects : 2
Effects:
A z=(1x101) array df=(2, 6) h0reject=True
B z=(1x101) array df=(6, 36) h0reject=False
So I don't know what to do. Clearly the FFi.plot class can access the p_values (with plot_p_values) but FFi.get_p_values cant!!? Can anyone lend a hand?
cheers, K
The easiest way to get the p values is to use the get_p_values
method that you mention, you just need to call the method by adding ()
to the end.
p = FFi.get_p_values()
print(p)
This yields:
([0.016584151119287904], [])
To see more detailed information for each effect in 2+-way ANOVA, including p values, use print
along with the individual F statistics like this:
print( FFi[0] )
print( FFi[1] )
The first print statement will produce output like this:
SPM{F} inference field
SPM.effect : Main A
SPM.z : (1x101) raw test stat field
SPM.df : (2, 6)
SPM.fwhm : 11.79254
SPM.resels : (1, 8.47993)
Inference:
SPM.alpha : 0.050
SPM.zstar : 24.30619
SPM.h0reject : True
SPM.p_set : 0.017
SPM.p_cluster : (0.017)
You can retrieve the clusters' p values like this:
p = [F.p for F in FFi]
which gives the same result as calling get_p_values
.
Note that there are no p values in this case for FFi[1]
because the test statistic fails to cross the alpha
-defined threshold (see the "Main B" panel in the figure above). If you need to report p values in this case as well, one option is simply to use "p > alpha". More precise p value are available parametrically up until about p = 0.5, but larger p values than that are not very accurate using parametric methods, so if you need p values for all cases consider using the nonparametric version: spm1d.stats.nonparam.anova2nested
.