Search code examples
mathgnuplotextrapolation

How do I implement linear extrapolation in gnuplot?


Can't make correct linear extrapolation. Here is the graph.

Here is the graph

It is clear that the extrapolation should decrease, because there are points on the left at the top, and on the right, everything is at the bottom, and there are many of them. Sample data.

1.12.2010;6700
5.12.2010;330000
8.12.2010;45300
12.12.2010;15400
5.05.2011;5300
31.05.2011;1500
2.06.2011;11400
24.11.2011;51000
19.03.2012;3300
....

I draw the graph using the following script.

#! /usr/bin/gnuplot -persist
set terminal postscript eps enhanced color solid
set output "result.ps"
set grid xtics ytics

set datafile separator ";"
set xtics rotate by 45 right
set grid xtics ytics

set xdata time
set timefmt "%d.%m.%Y"

# The equation
f(x) = a*x + b
fit f(x) "q.csv" u 1:2 via a,b

plot "q.csv" using 1:2 title "DATA" with p linestyle 3 lt 7 lw 2, \
  f(x) w l lt 1 lw 2 title "trendline"

Here is the part that makes linear extrapolation described in many places. For example, here. And as if logic dictates that it should work, but does not work ...

# The equation
f(x) = a*x + b
fit f(x) "q.csv" u 1:2 via a,b

What am I doing wrong?

I tested it. Tried what is given there, it didn't help me. This is how I did it.

# find out the StartDate
StartDate = "1.12.2010"          # manually by setting a value

f(x) = a*(x-StartDate) + b
set fit brief nolog
b=10
fit f(x) "q.csv" u 1:2 via a,b
set key top left
set format x "%d.%m.%Y" timedate

plot "q.csv" u 1:2 ti "Data" with linespoints linestyle 1 pt 7 ps 1, \
 f(x) w l lc rgb "red" ti "Fit"

Result

But in exel


Solution

  • The fit fails mainly because (1) the general Marquardt-Levenberg algorithm is not the best algorithm for solving a linear least-squares problem and (2) the resulting solutions for a and b differ by several orders of magnitude.

    For dealing with (2), you can experiment with the initial values for a and b, try

    a = 0.001
    b = 150000
    

    This should help. But if not, you can deal with (1) by "converting" Marquardt-Levenberg into a one-step Gauss-Newton with setting the following variables before running the fit command (see help set fit or help fit control variables for older versions of gnuplot):

    set fit lambda_factor 1
    set fit start_lambda 0.00001
    
    ### or for older versions of gnuplot
    # FIT_START_LAMBDA=0.00001
    # FIT_LAMBDA_FACTOR=1
    

    Note that gnuplot still needs two steps: one for finding the solution and one for verifying that it has converged.


    As @theozh points out, it often helps to shift the x-values by using f(x) = a*(x-StartDate) + b. It might also help to scale the parameters like this: f(x) = 1000*a*x+b/1000. Or can try to combine both, or ...