Search code examples
rstatisticsestimation

OLS estimator in R


I'm trying to compute OLS estimators manually in R for given vectors and matrices, but when I get the formula beta=(x'x)^-1(x'y), R tells me that the is a dimension issue, and I can't figure out why.

My code is

nr = 100
nc = 1000 
x=matrix(rnorm(nr * nc, mean=1, sd=1), nrow = nr)
epsilon=matrix(rnorm(nr * nc, mean=0, sd=1), nrow = nr)
k=c(1,2,4,8)

eta1=((epsilon^1-mean(epsilon^1))/(mean(epsilon^(1*2))-mean(epsilon^1)^2)^(1/2))
eta2=((epsilon^2-mean(epsilon^2))/(mean(epsilon^(2*2))-mean(epsilon^2)^2)^(1/2))
eta4=((epsilon^4-mean(epsilon^4))/(mean(epsilon^(4*2))-mean(epsilon^4)^2)^(1/2))
eta8=((epsilon^8-mean(epsilon^8))/(mean(epsilon^(8*2))-mean(epsilon^8)^2)^(1/2))


y1=x+eta1
y2=x+eta2
y4=x+eta4
y8=x+eta8

beta1=inv(t(x)*x)*(t(x)*y1)
beta2=inv(t(x)*x)*(t(x)*y2)
beta4=inv(t(x)*x)*(t(x)*y4)
beta8=inv(t(x)*x)*(t(x)*y8)


Also, I feel that there should be a way to loop through the values of k to get this automated, instead of doing each eta by hand. So, a bit of help in this area would also be appreciated.

The ouput I'm looking for is to get a vector of beta for each of the different values of k.


Solution

  • You have got several issues. Firstly, you should have nx1 matrix for y and epsilon, but you have nxm matrix for them instead. Secondly, you should use matrix multiplication which is %*% in R. i.e. t(x)%*%y1. However you use dot product (*) instead.

    For the sake of simplicity, lets create a matrix with 5 columns. My approach is creating a dependent variable which is related with x columns (independent variables or feature matrix in machine learning terminology)

    nr = 100
    nc = 5
    x=matrix(rnorm(nr * nc, mean=1, sd=1), nrow = nr)
    epsilon=matrix(rnorm(nr, mean=0, sd=1), nrow = nr) # it should be nx1
    k=c(1,2,4,8)
    
    eta1=((epsilon^1-mean(epsilon^1))/(mean(epsilon^(1*2))-mean(epsilon^1)^2)^(1/2))
    eta2=((epsilon^2-mean(epsilon^2))/(mean(epsilon^(2*2))-mean(epsilon^2)^2)^(1/2))
    eta4=((epsilon^4-mean(epsilon^4))/(mean(epsilon^(4*2))-mean(epsilon^4)^2)^(1/2))
    eta8=((epsilon^8-mean(epsilon^8))/(mean(epsilon^(8*2))-mean(epsilon^8)^2)^(1/2))
    

    To check the output we should create y values wisely. So, let's define some betas and create y values wrt them. At the end, we can compare the output with the inputs we defined. Note that, you should have 5 betas for 5 columns.

    # made up betas
    beta1_real <- 1:5
    beta2_real <- -4:0
    beta4_real <- 7:11
    beta8_real <- seq(0.25,1.25,0.25)
    

    To create the y values,

    y1=  10 + x %*% matrix(beta1_real) + eta1
    y2=  20 + x %*% matrix(beta2_real) + eta2
    y4=  30 + x %*% matrix(beta4_real) + eta4
    y8=  40 + x %*% matrix(beta8_real) + eta8
    

    Here, I also added a constant term for each y values. To get the constant term at the end, we should add ones at the beginning of our x matrix like,

    x <- cbind(matrix(1,nrow = nr),x)
    

    The rest is almost same with yours. Only difference is I used solve instead of inv and also I used the matrix multiplication (%*%)

    beta1=solve(t(x)%*%x)%*%(t(x)%*%y1)
    beta2=solve(t(x)%*%x)%*%(t(x)%*%y2)
    beta4=solve(t(x)%*%x)%*%(t(x)%*%y4)
    beta8=solve(t(x)%*%x)%*%(t(x)%*%y8)
    

    If we compare the outputs,

    beta1_real was,

    # [1] 1 2 3 4 5
    

    and the output of beta1 is,

    #           [,1]
    # [1,] 10.0049631
    # [2,]  0.9632124
    # [3,]  1.8987402
    # [4,]  2.9816673
    # [5,]  4.2111817
    # [6,]  4.9529084
    

    The results are similar. 10 at the beginning is the constant term I added. The difference stems from the error term applied (etas).