I've been running a least discriminant analysis on the results of a principal components analysis in R, and I've been calculating the appropriate number of PCs to use based on the minimum number of PCs that represent a certain threshhold of cumulative variation that return the highest reclassification rate, following the methodology in some previous studies.
I have been calculating the reclassification rates for the various cumulative numbers of PCs using a loop, but wish to print it as a data.frame for an RMarkdown report. This is the code I have been using.
for (j in 1:21){
vars<-sum(diag(prop.table(table(
trainingframe$locus,
lda(data.frame(trainingframe[-c(1)])[1:j],grouping=trainingframe$locus,CV=TRUE,prior = c(1,1,1)/3)$class))))
print(data.frame(j,vars))
}
In this code trainingframe
is the training dataset, and locus
is the categorical variable of interest the lda is interested in classifying. The first column is not selected because it is the locus. I can't provide the original data but this should be reprexable on any dataset that contains the principal components of a number of variables and then some categorical variable of interest for classification.
This is the results I get from the script.
j vars
1 1 0.512605
j vars
1 2 0.5882353
j vars
1 3 0.7058824
j vars
1 4 0.6806723
j vars
1 5 0.6722689
j vars
1 6 0.6638655
j vars
1 7 0.6722689
j vars
1 8 0.6386555
j vars
1 9 0.6470588
j vars
1 10 0.6554622
j vars
1 11 0.6554622
j vars
1 12 0.7226891
j vars
1 13 0.7142857
j vars
1 14 0.6890756
j vars
1 15 0.6806723
j vars
1 16 0.6806723
j vars
1 17 0.6890756
j vars
1 18 0.6554622
j vars
1 19 0.6470588
However, as you can see, the printed result prints a number of independent data frames for each result rather than one dataframe containing the results of all of the analyses.
What I'd like to produce is a data.frame like the following...
j vars
1 0.5126050
2 0.5882353
3 0.7058824
4 0.6806723
5 0.6722689
6 0.6638655
7 0.6722689
8 0.6386555
9 0.6470588
10 0.6554622
11 0.6554622
12 0.7226891
13 0.7142857
14 0.6890756
15 0.6806723
16 0.6806723
17 0.6890756
18 0.6554622
19 0.6470588
I am trying to figure out a way to rewrite the above code to produce the last data frame shown here.
We can initialize a dataset and then rbind
instead of print
ing
d1 <- data.frame(j = integer(), vars = numeric())
for (j in 1:21){
vars<-sum(diag(prop.table(table(
trainingframe$locus,
lda(data.frame(trainingframe[-c(1)])[1:j],
grouping=trainingframe$locus,CV=TRUE,
prior = c(1,1,1)/3)$class))))
d1 <- rbind(d1, data.frame(j,vars))
}
Or another way to write the code is with lapply
out <- do.call(rbind, lapply(1:21, function(j) {
cls <- lda(data.frame(trainingframe[-1])[seq_len(j)],
grouping = trainingframe$locus, CV = TRUE,
prior = c(1, 1, 1)/3)$class
vars <- sum(diag(prop.table(table(trainingframe$locus, cls))))
data.frame(j, vars)
}))