Search code examples
rmathematical-optimizationlmweighted

Rebuild weighted regression with optim in R


I can successfully rebuild a linear regression in R using optim, but I get a wrong result when I also use a weight.

df <- mtcars
df$weight <- df$disp / mean(df$disp) #just an example

Here are the correct results using lm:

summary(lm(mpg ~ cyl, data = df))$r.squared
[1] 0.72618

summary(lm(mpg ~ cyl, data = df, weights = weight))$r.squared
[1] 0.6794141

Here the function and optim for the unweighted version that works:

minX.RSQ <- function(par) {
  esti <- par[1] + par[2]*df[,'cyl']
  resi <- (esti - df[,'mpg'])**2
  RSS <- sum(resi)
  SST <- sum((df[,'mpg'] - mean(df[,'mpg']))**2)
  -(1 - RSS / SST)
}
optim(par = c(0,0), minX.RSQ)$value
[1] -0.72618 #which is correct, cp. above

Here the function and optim for the weighted version, I tried weighted sums, but the result is NOT correct:

minW.RSQ <- function(par) {
  esti <- par[1] + par[2]*df[,'cyl']
  resi <- (esti - df[,'mpg'])**2 
  RSS <- sum(resi * df[,'weight'])
  SST <- sum((df[,'mpg'] - mean(df[,'mpg']))**2 * df[,'weight'])
  -(1 - RSS / SST)
}
optim(par = c(0,0), minW.RSQ)$value
[1] -0.7521823 #which is NOT correct, cp. above

How can I fix the function for the weighted version?


Solution

  • You are really close. You just need to use a weighted mean in your SST calc:

    minW.RSQ <- function(par) {
      esti <- par[1] + par[2]*df[,'cyl']
      resi <- (esti - df[,'mpg'])**2 
      RSS <- sum(resi * df[,'weight'])
      SST <- sum((df[,'mpg'] - weighted.mean(df[,'mpg'], w = df[,'weight']))**2 * df[,'weight'])
      -(1 - RSS / SST)
    }
    optim(par = c(0,0), minW.RSQ)$value
    # [1] -0.6794141