Search code examples
statisticsgnuplotmeanstandard-deviation

gnuplot frequency table stats


I have a data frequency table and would like to calculate it's mean and standard deviation. The first column symbolises the frequency and second - the value of data.The way I need the mean to be calculated is (446*0+864*1+277*2+...+1*12)/(0+1+2+...+12) = ~1.35, yet when I use gnuplot stats, it gives me the output of separate columns. How can I change my code so that it would give me the output that I want?

Data table:

446 0
864 1
277 2
111 3
62  4
32  5
19  6
9   7
8   8
3   10
3   11
1   12

Gnuplot code:

stats "$input" using 2:1

Output:

* FILE: 
  Records:           12
  Out of range:       0
  Invalid:            0
  Column headers:     0
  Blank:              0
  Data Blocks:        1

* COLUMNS:
  Mean:               5.7500           152.9167
  Std Dev:            3.7887           251.5374
  Sample StdDev:      3.9572           262.7223
  Skewness:           0.1569             1.9131
  Kurtosis:           1.8227             5.5436
  Avg Dev:            3.2500           188.0417
  Sum:               69.0000          1835.0000
  Sum Sq.:          569.0000        1.03986e+06

  Mean Err.:          1.0937            72.6126
  Std Dev Err.:       0.7734            51.3449
  Skewness Err.:      0.7071             0.7071
  Kurtosis Err.:      1.4142             1.4142

  Minimum:            0.0000 [ 0]        1.0000 [11]
  Maximum:           12.0000 [11]      864.0000 [ 1]
  Quartile:           2.5000             5.5000
  Median:             5.5000            25.5000
  Quartile:           9.0000           194.0000

  Linear Model:       y = -46.89 x + 422.5
  Slope:              -46.89 +- 14.86
  Intercept:          422.5 +- 102.4
  Correlation:        r = -0.7062
  Sum xy:             2475

Solution

  • Try this:

    Code:

    ### special mean
    reset session
    
    $Data <<EOD
    446 0
    864 1
    277 2
    111 3
    62  4
    32  5
    19  6
    9   7
    8   8
    3   10
    3   11
    1   12
    EOD
    
    stats $Data u ($1*$2):1
    
    print STATS_sum_x, STATS_sum_y
    print STATS_sum_x/STATS_sum_y
    ### end of code
    

    Result:

    * FILE: 
      Records:           12
      Out of range:       0
      Invalid:            0
      Column headers:     0
      Blank:              0
      Data Blocks:        1
    
    * COLUMNS:
      Mean:             206.2500           152.9167
      Std Dev:          252.3441           251.5374
      Sample StdDev:    263.5648           262.7223
      Skewness:           1.5312             1.9131
      Kurtosis:           4.2761             5.5436
      Avg Dev:          195.6667           188.0417
      Sum:             2475.0000          1835.0000
      Sum Sq.:       1.27460e+06        1.03986e+06
    
      Mean Err.:         72.8455            72.6126
      Std Dev Err.:      51.5095            51.3449
      Skewness Err.:      0.7071             0.7071
      Kurtosis Err.:      1.4142             1.4142
    
      Minimum:            0.0000 [ 0]        1.0000 [11]
      Maximum:          864.0000 [ 1]      864.0000 [ 1]
      Quartile:          31.5000             5.5000
      Median:            89.0000            25.5000
      Quartile:         290.5000           194.0000
    
      Linear Model:       y = 0.7622 x - 4.279
      Slope:              0.7622 +- 0.2032
      Intercept:          -4.279 +- 66.21
      Correlation:        r = 0.7646
      Sum xy:             9.609e+05
    

    Your values:

    2475.0 1835.0
    1.34877384196185