Search code examples
pythonpandasscipystatsmodels

Scipy - statistical tests between two groups


I have two samples from the population of neurons in the brain, each sample consisting of a thousand neuron instances, of categories:

  1. cerebellum
  2. cortex

Now I'm extracting multiple metrics for each sample using complex network analysis, for example, neuron degree of connectivity k, a discreet number n = 0, 1, ...., n, or clustering coefficient C, a continous value between 0.00000 and 1.00000.

df.sample(3) (where web is category) in my pandas dataframes:

cortex:

         web    k   clustering_coeff
3080    cortex  6.0         0.733333        
2951    cortex  11.0        0.428571    
1435    cortex  5.0         0.563571    

...

cerebellum

815 cerebellum  10.0        0.533333    
850 cerebellum  9.0         0.416667    
1213 cerebellum 7.0         0.454545
...

How can I use scipy stats methods to I compare both metrics in order to know if theres a statistically significant difference between the two gropus?

Assuming a distribution close to Gaussian, but skewed to the right, I'm not sure what is the best approach. Parametric, Non-Parametric, T-test and so on.

Any ideas?


Solution

  • for the "k" metric:

    stats.mannwhitneyu(df.loc[df.web=="cortex", "k"], df.loc[df.web=="cerebellum", "k"])
    

    for the "clustering_coeff" metric:

    stats.mannwhitneyu(df.loc[df.web=="cortex", "clustering_coeff"], df.loc[df.web=="cerebellum", "clustering_coeff"])
    

    In general use a non-parametric test if you don't know anything about the distribution in exam.