Manually coded Poisson log likelihood function returns a different result from glm for interactive models

I've coded my own Poisson likelihood function, but it is returning values that are significantly different from glm for a model with an interaction for a specific data. Notice that the function spits out exactly the same result as glm from all other data I've tried, as well as for the model without the interaction for this data.

> # Log likelihood function
> llpoi = function(X, y){
+   # Ensures X is a matrix
+   if(class(X) != "matrix") X = as.matrix(X)
+   # Ensures there's a constant
+   if(sum(X[, 1]) != nrow(X)) X = cbind(1, X)  
+   # A useful scalar that I'll need below
+   k = ncol(X)
+   ## Function to be maximized
+   FUN = function(par, X, y){
+     # beta hat -- the parameter we're trying to estimate
+     betahat = par[1:k]
+     # mu hat -- the systematic component
+     muhat = X %*% betahat
+     # Log likelihood function
+     sum(muhat * y - exp(muhat))
+   }
+   # Optimizing
+   opt = optim(rep(0, k), fn = FUN, y = y, X = X, control = list(fnscale = -1), method = "BFGS", hessian = T)
+   # Results, including getting the SEs from the hessian
+ cbind(opt$par, sqrt(diag(solve(-1 * opt$hessian))))
+ }
> 
> # Defining inputs 
> y = c(2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 3, 1, 1, 3, 2, 2, 2, 3, 1, 2, 4, 3, 3, 3, 1, 3, 0, 2, 1, 2, 4, 1, 2, 0, 2, 1, 2, 1, 4, 1, 2, 0)
> x1 = c(8, 1, 0, 3, 3, 3, 5, 4, 0.4, 1.5, 2, 1, 1, 7, 2, 3, 0, 2, 1.5, 5, 1, 4, 5.5, 6, 3, 3, 2, 0.5, 5, 10, 3, 22, 20, 3, 20, 10, 15, 25, 15, 6, 3.5, 5, 18, 2, 15.0, 16, 24)
> x2 = c(12, 12, 12, 16, 12, 12, 12, 12, 12, 12, 12, 12, 9, 9, 12, 9, 12, 12, 9, 16, 9, 6, 12, 9, 9, 12, 12, 12, 12, 14, 14, 14, 9, 12, 9, 12, 3, 12, 9, 6, 12, 12, 12, 12, 12, 12, 9)
> 
> # Results
> withmyfun = llpoi(cbind(x1, x2, x1 * x2), y)
> round(withmyfun, 2)
      [,1] [,2]
[1,]  0.96 0.90
[2,] -0.05 0.09
[3,] -0.02 0.08
[4,]  0.00 0.01
> withglm = glm(y ~ x1 + x2 + x1 * x2, family = "poisson")
> round(summary(withglm)$coef[, 1:2], 2)
            Estimate Std. Error
(Intercept)     1.08       0.90
x1             -0.07       0.09
x2             -0.03       0.08
x1:x2           0.00       0.01

Is this something data specific? Is it inherent to the optimization process, which will eventually diverge more significantly from glm and I just got unlucky with this data? Is it a function of using method = "BFGS" for optim?

Solution

After much research, I learned that the two results differ because glm.fit, the workhorse behind glm optimizes the function through Newton-Raphson method, while I used BFGS in my llpoi function. BFGS is faster, but less precise. The two results will be very similar on most cases, but may differ more significantly when the surface area is too flat or has too many maxima, as correctly pointed out by amatsuo_net, because the climbing algorithm used by BFGS will get stuck.