Search code examples
gnuplot

Gnuplot fitting algorithm seems to fail


I have problems with running even a relatively simple gnuplot fit. For explanatory reasons, I condensed the problem to a simple perfect Gaussian function with the following data:

2441970000 1.098423341
2441972000 1.643213215
2441974000 2.390861121
2441976000 3.383382081
2441978000 4.656761592
2441980000 6.233805219
2441982000 8.116311684
2441984000 10.27780726
2441986000 12.65839042
2441988000 15.16326649
2441990000 17.66620695
2441992000 20.01843507
2441994000 22.06242256
2441996000 23.64898672
2441998000 24.65517792
2442000000 25
2442002000 24.65517792
2442004000 23.64898672
2442006000 22.06242256
2442008000 20.01843507
2442010000 17.66620695
2442012000 15.16326649
2442014000 12.65839042
2442016000 10.27780726
2442018000 8.116311684
2442020000 6.233805219
2442022000 4.656761592
2442024000 3.383382081
2442026000 2.390861121
2442028000 1.643213215
2442030000 1.098423341

I defined the following function to fit:

g(x) = b/(sigma*sqrt(2.*pi))*exp(-(x-x0)**2./(2.*sigma**2))
b = 700000
sigma = 12000
x0 = 2442000000

Then I fitted the data above (contained in the file MyData.csv):

fit g(x) "MyData.csv" via b,sigma,x0

The fit converges, but it doesn't really change any of the parameters:

iter      chisq       delta/lim  lambda   b             sigma         x0           
   0 3.1760646658e+01   0.00e+00  7.87e+03    7.000000e+05   1.200000e+04   2.442000e+09
   * 6.6450374757e+03   9.95e+04  7.87e+04    7.000001e+05   1.200000e+04   2.441821e+09
   * 6.6450360342e+03   9.95e+04  7.87e+05    7.000000e+05   1.200000e+04   2.441913e+09
   * 9.1399191811e+01   6.53e+04  7.87e+06    7.000000e+05   1.200000e+04   2.441998e+09
   * 3.1766736930e+01   1.92e+01  7.87e+07    7.000000e+05   1.200000e+04   2.442000e+09
   * 3.1760647267e+01   1.92e-03  7.87e+08    7.000000e+05   1.200000e+04   2.442000e+09
   * 3.1760646658e+01   1.94e-07  7.87e+09    7.000000e+05   1.200000e+04   2.442000e+09
   * 3.1760646658e+01   1.12e-10  7.87e+10    7.000000e+05   1.200000e+04   2.442000e+09
   * 3.1760646658e+01   1.01e-10  7.87e+11    7.000000e+05   1.200000e+04   2.442000e+09
   1 3.1760646658e+01   0.00e+00  7.87e+10    7.000000e+05   1.200000e+04   2.442000e+09
iter      chisq       delta/lim  lambda   b             sigma         x0           

After 1 iterations the fit converged.
final sum of squares of residuals : 31.7606
rel. change during last iteration : 0

degrees of freedom    (FIT_NDF)                        : 28
rms of residuals      (FIT_STDFIT) = sqrt(WSSR/ndf)    : 1.06504
variance of residuals (reduced chisquare) = WSSR/ndf   : 1.13431

Final set of parameters            Asymptotic Standard Error
=======================            ==========================
b               = 700000           +/- 1.34e+17     (1.915e+13%)
sigma           = 12000            +/- 242.3        (2.019%)
x0              = 2.442e+09        +/- 4.676e+17    (1.915e+10%)

correlation matrix of the fit parameters:
                b      sigma  x0     
b               1.000 
sigma          -0.073  1.000 
x0              1.000 -0.073  1.000 

Then I plotted the data and the fit curve:

plot "MyData.csv", g(x)

and got this:

This is clearly a bad approximation for the data; it doesn't even deviate from my initial parameter settings. And these settings were so close to the actual data distribution that the Levenberg-Marquardt algorithm surely should have worked.

So I must be doing something fundamentally wrong, but what?


Solution

  • I cannot explain in detail what exactly is going wrong (maybe someone else can), but when you are dealing with exponential functions "strange" things can happen, especially when your x-values are very large. And although you are taking the difference, keep in mind the limited numeric precision, e.g. exp(-707) = 0 or exp(710) = undefined value. I could imagine if the fitting algorithm gets some zero values during varying the parameters this might mess up the fitting procedure.

    By the way, you used a floating point number in the exponent. During testing I also noticed that, e.g.

    print (-1)**2      #   1
    print (-1.0)**2    #   1.0
    print (-1)**2.0    #   {1.0, -2.44921270764475e-16}   # complex number
    

    So, my conclusion from this would be if you mean squared, better don't use **2.0, but **2 only in order to avoid rounding errors.

    Anyway, one possible solution I guess is to shift your data values close to zero and scale them. Haven't found out yet whether set fit prescale or other fitting parameters might offer a simpler solution (check help set fit).

    Although I start with values b0=1000, sigma0=10, x0=0.1 the fitting finds the values b0=751, sigma0=12, x0=3e-12. After fitting the shifted and scaled data, shift and scale the fitting parameter as well. Maybe there are better and smarter solutions.

    Script:

    ### scaling data to be fitted
    reset session
    
    $Data <<EOD
    2441970000 1.098423341
    2441972000 1.643213215
    2441974000 2.390861121
    2441976000 3.383382081
    2441978000 4.656761592
    2441980000 6.233805219
    2441982000 8.116311684
    2441984000 10.27780726
    2441986000 12.65839042
    2441988000 15.16326649
    2441990000 17.66620695
    2441992000 20.01843507
    2441994000 22.06242256
    2441996000 23.64898672
    2441998000 24.65517792
    2442000000 25
    2442002000 24.65517792
    2442004000 23.64898672
    2442006000 22.06242256
    2442008000 20.01843507
    2442010000 17.66620695
    2442012000 15.16326649
    2442014000 12.65839042
    2442016000 10.27780726
    2442018000 8.116311684
    2442020000 6.233805219
    2442022000 4.656761592
    2442024000 3.383382081
    2442026000 2.390861121
    2442028000 1.643213215
    2442030000 1.098423341
    EOD
    
    b0     = 1000
    sigma0 = 10
    x0     = 0.1
    
    x00       = 2442000000.
    c00       = 1000.
    scaleX(x) = (x-x00)/c00
    
    g(x)  = b0/(sigma0*sqrt(2.*pi))*exp(-(x-x0)**2/(2.*sigma0**2))
    fit g(x) $Data u (scaleX($1)):2 via b0,sigma0,x0
    
    x0     = x0 + x00
    sigma0 = sigma0 * c00
    b0     = b0 * c00
    set label at graph 0.1, 0.9 sprintf("b0 = %.0f\nsigma0 = %.0f\nx0 = %.0f",b0,sigma0,x0)
    
    set format x "%.0f"
    plot $Data u 1:2 , g(x)
    ### end of script
    

    Result:

    Final set of parameters            Asymptotic Standard Error
    =======================            ==========================
    b0              = 751.988          +/- 2.629e-08    (3.496e-09%)
    sigma0          = 12               +/- 4.873e-10    (4.061e-09%)
    x0              = 3.62444e-12      +/- 4.714e-10    (1.301e+04%)
    

    enter image description here