Ciao, I have several columns that represents scores. I want to estimate models where each SCORE is a function of STUDYTIME. So I want to run as many models as there are SCORE columns all simple models that are functions of STUDYTIME. Then I want to store the coefficients of STUDYTIME in a new column that has rownames equal to the SCORE column name. And last of all I am not sure of how to do clustering on the linear models because STUDENTS are each in the data two times.
Here is my replicating example. This is the data I have now:
df <- data.frame(replicate(5, rnorm(10)))
df[1]<-c(1,1,2,2,3,3,4,4,5,5)
colnames(df) <- c('student','studytime', 'score1','score2','score3')
This is my attempt at the coding:
for (i in 1:nrow(df)) {
dfx <- df[,i]
lm <- lm(dfx[,3:5] ~ study_time)
resdat[,i] = summary(lm)$coefficients[2]
}
You can do this using simply lapply
and sapply
function.
Here is the r code:
Generating Data
df <- data.frame(replicate(5, rnorm(10)))
df[1]<-c(1,1,2,2,3,3,4,4,5,5)
colnames(df) <- c('student','studytime', 'score1','score2','score3')
Storing Results
Results <- lapply(df[, -c(1,2)], FUN = function(x) lm(x ~ df$studytime))
Coef <- sapply(Results, FUN = coefficients)