How R calculates the Regression coefficients using lm() function

I wanted to replicate R's calculation on estimation of regression equation on below data:

set.seed(1)
Vec = rnorm(1000, 100, 3)
DF = data.frame(X1 = Vec[-1], X2 = Vec[-length(Vec)])

Below R reports estimates of coefficients

coef(lm(X1~X2, DF))  ### slope =  -0.03871511

Then I manually estimate the regression estimate for slope

(sum(DF[,1]*DF[,2])*nrow(DF) - sum(DF[,1])*sum(DF[,2])) / (nrow(DF) * sum(DF[,1]^2) - (sum(DF[,1])^2)) ### -0.03871178

They are close but still are nor matching exactly.

Can you please help me to understand what am I missing here?

Any pointer will be very helpful.

Solution

The problem is that X1 and X2 are switched in lm relative to the long formula.

Background

The formula for slope in lm(y ~ x) is the following where x and y each have length n and x is short for x[i] and y is short for y[i] and the summations are over i = 1, 2, ..., n.

Source of the problem

Thus the long formula in the question, also shown in (1) below, corresponds to lm(X2 ~ X1, DF) and not to lm(X1 ~ X2, DF). Either change the formula in the lm model as in (1) below or else change the long formula in the answer by replacing each occurrence of DF[, 1] in the denominator with DF[, 2] as in (2) below.

# (1)

coef(lm(X2 ~ X1, DF))[[2]]
## [1] -0.03871178

(sum(DF[,1]*DF[,2])*nrow(DF) - sum(DF[,1])*sum(DF[,2])) / 
  (nrow(DF) * sum(DF[,1]^2) - (sum(DF[,1])^2))  # as in question
## [1] -0.03871178

# (2)

coef(lm(X1 ~ X2, DF))[[2]]  # as in question
## [1] -0.03871511

(sum(DF[,1]*DF[,2])*nrow(DF) - sum(DF[,1])*sum(DF[,2])) / 
  (nrow(DF) * sum(DF[,2]^2) - (sum(DF[,2])^2))
## [1] -0.03871511