I am quite new to C++ and Rcpp integration. I am required to create a program using C++ with R integration to find the MLE/root of a Cauchy Distribution.
Below is my code thus far.
#include <Rcpp.h>
#include <math.h>
#include <iostream>
#include <cstdlib>
using namespace std;
using namespace Rcpp;
// [[Rcpp::export]]
double Cauchy(double x, double y); //Declare Function
double Cauchy(double x,double y) //Define Function
{
return 1/(M_PI*(1+(pow(x-y,2)))); //write the equation whose roots are to be determined x=chosen y=theta
}
using namespace std;
// [[Rcpp::export]]
int Secant (NumericVector x){
NumericVector xvector(x) ; //Input of x vector
double eplison= 0.001; //Threshold
double a= xvector[3]; //Select starting point
double b= xvector[4];//Select end point
double c= 0.0; //initial value for c
double Theta= 10.6; //median value for theta estimate
int noofIter= 0; //Iterations
double error = 0.0;
if (std::abs(Cauchy(a, Theta) <(std::abs(Cauchy(a, Theta))))) {
do {
a=b;
b=c;
error= (b-(Cauchy(b, Theta)))*((a-b)/(Cauchy(a, Theta)-Cauchy(b, Theta)));
error= Cauchy(c,Theta);
//return number of iterations
noofIter++;
for (int i = 0; i < noofIter; i += 1) {
cout << "The Value is " << c << endl;
cout << "The Value is " << a << endl;
cout << "The Value is " << b << endl;
cout << "The Value is " << Theta << endl;
}
} while (std::abs(error)>eplison);
}
cout<<"\nThe root of the equation is occurs at "<<c<<endl; //print the root
cout << "The number of iterations is " << noofIter;
return 0;
}
With a few amendments, the program either goes into a never ending loop or returns a value which is infinitely small.
My understanding of this mathematics is limited. Therefore any help or correction in that would be greatly appreciated.
The X vector that we have been given as an output is
x <- c( 11.262307 , 10.281078 , 10.287090 , 12.734039 ,
11.731881 , 8.861998 , 12.246509 , 11.244818 ,
9.696278 , 11.557572 , 11.112531 , 10.550190 ,
9.018438 , 10.704774 , 9.515617 , 10.003247 ,
10.278352 , 9.709630 , 10.963905 , 17.314814)
Using previous R code we know that the MLE/Root for this distribution occurs at 10.5935 approx
The code used to obtain this MLE was
optimize(function(theta)-sum(dcauchy(x, location=theta,
log=TRUE)), c(-100,100))
Thanks!
With the optimize()
function you are directly searching for an extremum of the likelihood. An alternative is to use a root finding algorithm (e.g. the secant method) together with the derivative of the (log-)likelihood. From Wikipedia we get the formula we have to solve. In R this could look like this:
x <- c( 11.262307 , 10.281078 , 10.287090 , 12.734039 ,
11.731881 , 8.861998 , 12.246509 , 11.244818 ,
9.696278 , 11.557572 , 11.112531 , 10.550190 ,
9.018438 , 10.704774 , 9.515617 , 10.003247 ,
10.278352 , 9.709630 , 10.963905 , 17.314814)
ld <- function(sample, theta){
xp <- outer(sample, theta, FUN = "-")
colSums(xp/(1+xp^2))
}
uniroot(ld, sample = x, lower = 0, upper = 20)$root
#> [1] 10.59724
Note that the derivative of the log-likelihood is vectorized on both arguments. This allows for easy plotting:
theta <- seq(0, 20, length=500)
plot(theta, ld(x, theta), type="l",
xlab=expression(theta), ylab=expression(ld(x, theta)))
From this plot we already see that it will be tricky to find the correct starting points for the secant method to work.
Let's move this to C++ (C++11 to be precise):
#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
Rcpp::List secant(const std::function<double(double)>& f,
double a, double b, int maxIterations, double epsilon) {
double c(0.0);
do {
c = b * (1 - (1 - a/b) / (1 - f(a)/f(b)));
a = b;
b = c;
} while (maxIterations-- > 0 && std::abs(a - b) > epsilon);
return Rcpp::List::create(Rcpp::Named("root") = c,
Rcpp::Named("f.root") = f(c),
Rcpp::Named("converged") = (maxIterations > 0));
}
// [[Rcpp::export]]
Rcpp::List mleCauchy(const Rcpp::NumericVector& sample, double a, double b,
int maxIterations = 100, double epsilon = 0.0001) {
auto f = [&sample](double theta) {
Rcpp::NumericVector xp = sample - theta;
xp = xp / (1 + xp * xp);
return Rcpp::sum(xp);
};
return secant(f, a, b, maxIterations, epsilon);
}
/*** R
x <- c( 11.262307 , 10.281078 , 10.287090 , 12.734039 ,
11.731881 , 8.861998 , 12.246509 , 11.244818 ,
9.696278 , 11.557572 , 11.112531 , 10.550190 ,
9.018438 , 10.704774 , 9.515617 , 10.003247 ,
10.278352 , 9.709630 , 10.963905 , 17.314814)
mleCauchy(x, 11, 15)
#-> does not converge
mleCauchy(x, 11, 14)
#-> 10.59721
mleCauchy(x, mean(x), median(x))
#-> 10.59721
*/
The secant()
function works for any std::function
that takes a double
as argument and returns a double
. Such a function is than defined as lambda function that depends on the provided sample values. As expected one gets the correct root only when starting with values that are close to the correct value.
Lambda functions might be a bit confusing at first sight, but they are very close to what we are used to in R. Here the same algorithm written in R:
secant <- function(f, a, b, maxIterations, epsilon) {
for (i in seq.int(maxIterations)) {
c <- b * (1 - (1 - a/b) / (1 - f(a)/f(b)))
a <- b
b <- c
if (abs(a - b) <= epsilon)
break
}
list(root = c, f.root = f(c), converged = (i < maxIterations))
}
mleCauchy <- function(sample, a, b, maxIterations = 100L, epsilon = 0.001) {
f <- function(theta) {
xp <- sample - theta
sum(xp/(1 + xp^2))
}
secant(f, a, b, maxIterations, epsilon)
}
x <- c( 11.262307 , 10.281078 , 10.287090 , 12.734039 ,
11.731881 , 8.861998 , 12.246509 , 11.244818 ,
9.696278 , 11.557572 , 11.112531 , 10.550190 ,
9.018438 , 10.704774 , 9.515617 , 10.003247 ,
10.278352 , 9.709630 , 10.963905 , 17.314814)
mleCauchy(x, 11, 12)
#-> 10.59721
The R function f
and the lambda function f
take the vector sample
from the environment where they are defined. In R this happens implicitly, while in C++ we have to explicitly tell that this value should be captured. The numeric theta
is the argument supplied when the function is called, i.e. the successive estimates for the root starting with a
and b
.