Search code examples
cgslskew

gsl_stats_skew returns wrong result


I'm using GSL library 1.15 on a C application and I'm interested in computing the skewness of a data-set of doubles. Theory :

enter image description here

which, according to matlab, should translate in : enter image description here

It seems to me that the output of the gsl dedicated function gsl_stats_skew returns a wrong result. Considering the following piece of code :

const double array[] = { 2.5, 3.7, 6.6, 9.1, 9.5, 10.7, 11.9, 21.5, 22.6, 25.2 };
const skewness = gsl_stats_skew(array, 1, 10);
printf("result : %f\n", skewness);
  • the expected result is 0.5751
  • the returned result is 0.41408

What am I missing?


Solution

  • The reason is that there are different ways to implement the skewness function, in particular :

    • biased skewness : it's matlab's default and excel's SKEW.P function, which equals to 0.4850 on the previous dataset
    • unbiased skewness : matlab computes it when you add flag=0, it's excel's SKEW function and it's the common way online calculators compute it => 0.57551
    • gsl skewness : it's the biased version with a variation : the standard deviation is computed by using 1/(N-1) factor instead of 1/N => 0.41408

    Here's my C implementation of the biased version, hope it helps :

    double skewness(const double elements[], const int numElements){
      const double media = gsl_stats_mean(elements, 1, numElements);
      const double stDev = gsl_stats_sd_with_fixed_mean(elements, 1, numElements, media); // use 1/N normalization factor instead of 1/(N-1)
      double sum_numerator=0;
      for(int i=0; i<numElements; i++) sum_numerator += pow((elements[i] - media),3);
      const double numerator = sum_numerator / numElements;
      const double denominator = pow(stDev,3);
      return numerator / denominator; }