The gsl
library documentation about the multidimensional minimization algorithms reads:
You must provide a parametric function of n variables for the minimizers to operate on. You may also need to provide a routine which calculates the gradient of the function and a third routine which calculates both the function value and the gradient together.
The example provided define such functions as follows (I omitted problem specific details, replaced by ...
):
The function f
itself
double
my_f (const gsl_vector *v, void *params)
{
...
return rv;
}
The gradient of f
, df
= (df/dx, df/dy).
void
my_df (const gsl_vector *v, void *params, gsl_vector *df)
{
...
gsl_vector_set(df, ...);
gsl_vector_set(df, ...);
}
And finally, the third function to compute both f
and df
together
void
my_fdf (const gsl_vector *x, void *params, double *f, gsl_vector *df)
{
*f = my_f(x, params);
my_df(x, params, df);
}
These three are members of a struct
type gsl_multimin_function_fdf
, which is eventually passed to the minimizer.
There are several cases in which once the function value is calculated, its derivative may be more easily calculated, e.g.: Let f(x,y) = exp(x * g(y))
, where g(y)
may be expensive to compute, then it's convenient to do simply df/dx = g(y) f(x,y)
using g(y) = log(f)/x
.
Now, as far as I can learn from the example, the minimizer requires the function and its derivative to be defined independently, while the third definition looks like a dummy wrapper.
Is it possible to define these functions in a way such that the function and its derivative can actually be calculated within the same scope?
Edit:
In the documentation, regarding fdf
, it is stated
This function provides an optimization of the separate functions for
f(x)
andg(x)
—it is always faster to compute the function and its derivative at the same time.
Yet, I'm not certain how. Scanning through the header, I found there are three macros defined, one for each of these three functions
#define GSL_MULTIMIN_FN_EVAL_F(F,x) (*((F)->f))(x,(F)->params)
#define GSL_MULTIMIN_FN_EVAL_DF(F,x,g) (*((F)->df))(x,(F)->params,(g))
#define GSL_MULTIMIN_FN_EVAL_F_DF(F,x,y,g) (*((F)->fdf))(x,(F)->params,(y),(g))
which seem to be called alternatively, depending on the optimization algorithm used. Could someone confirm this, please? Back to my original question, does this imply that the library user has to check the source to find out what method to use in order to take advantage of the possibility of computing both the function value and its gradient together?
The GSL asks for three functions, (a) one that calculates the value, (b), another that calculates the gradient, and (c) one that calculates both, for exactly the same reason that you are concerned with:
There are several cases in which once the function value is calculated, its derivative may be more easily calculated.
In other words, it can be easier to calculate both value and gradient in one scope than to calculate value and gradient separately. However, evaluating both will be needlessly expensive if the minimizer only needs the gradient, or if the minimizer only needs the value.
Therefore, you should trust the GSL that it knows what it wants. The minimizer will call the third function whenever it needs to know both value and gradient in a specific point, and that it will call the first or second function if it only needs to know either the value or the gradient.
It is up to you to decide whether you want the third function to perform some smart calculations that take advantage of the specific problem, or whether you want it to be a simple wrapper.