So I'm taking a class for R, and I'm having a really hard time coding basic formulas.
Basically what I'm trying to do is find 3 variables but I keep getting errors. (I've attached a picture for easier presentation)
Note:
d is the number of DOF, d=1,...,20
set.seed(29)
library(ISLR)
library(splines)
#### ETAPE 1
x <- runif(1000,min=0,max=10)
lambda=(2*x)+(0.2*x*sin(x))
y <- rpois(1000,lambda)
J <- data.frame(x=x, y=y)
plot(x,y,cex=0.4)
### ETAPE 2
ajust <- matrix(NA,20,1000)
for(i in (1:20)) {
smoothing=lm(y~ns(x=x,df=i),data=J)
ajust[i,]=predict(smoothing)
}
fd=function(d) {return(smoothing[d])}
for(i in (1:20)) {
lines(x,ajust[i,],col=i)
}
lines(x,lambda,col='black')
for(i in (1:20)) {
d1<- (1/1000)*sum((y-ajust[i,])**2)
}
### Calcul de D2
Mean=lambda
for (d in (1:20)){
W=(Mean-fd(x))**2
d2=sum(W)/1000
}
It works up until "calcul de D2" where I get "Non-numeric argument to binary operator " error. And I don't understand how to make it work. I know my question might seem a little bit vague so don't hesitate to let me know if something isn't clear.
The bug in the code is that your fd(x)
function call returns a list. This is, as the error says, not a numeric.
We don't have information on what f(d) should be (it's not defined in the picture or question), but it seems that the solution would be to extract whatever component from fd(x)
you meant to have subtracted from Mean
.
For example:
for (d in (1:20)){
W=(Mean-fd(x)$fitted.values)**2
d2=sum(W)/1000
}
Update
I saw your followup comment/question regarding "D3" from the equations in the picture. I'm a little unsure because I don't have the textbook/context to be sure of the notation (X isn't formally defined and I also had to take a leap of faith that Y in the picture = Mean
in the code based on how you used it). This is my best guess, based on that context:
# The equation for d3 is the expected value of (Y-fd(X))^2.
#
# I don't know the context of this, but I see the definition of d1 and d2.
#
# D1 = for(i in (1:20)) {
# d1<- (1/1000)*sum((y-ajust[i,])**2)
# }
d1 # [1] 10.04203
#
# D2 = for (d in (1:20)){
# W=(Mean-fd(x)$fitted.values)**2
# d2=sum(W)/1000
# }
#
d2 # [1] 0.2024568
#
# Based on that, Y = Mean, y = y, x=x, i=i, N=1000
# W = (Y - fd(xi))^2
# I presume X = vectorized xi
#
# So, D3 =
D3 = (Mean - fd(x)$fitted.values)^2
#Since it's an expected value, I presume we take the mean
D3 = mean(D3)
Where I may be guessing wrong there is probably X. X in the pictured equation looks like the vector of all x
[i]. But each element of x
is an x
[i] so x
is already the vector representation thereof.