I'm trying to do a leave-one-out cross-validation on a relatively small dataset (n = 22, p = 17) on a linear regression made from the LARS algorithm. Essentially I need to create n matrices of standardized data (each column consists of entries centered by the mean and standardized by the SD of the column).
I've never used lists before, but would be open to making lists as long as columns of the different matrices can be manipulated/standardized.
Here's what I tried in R:
for (i in 1:n)
{
x.standardized.i <- matrix(data = NA, nrow = (n-1), ncol = p) #creates n matrices, all n-1 x p
for (j in 1:p)
{
x.standardized.i[,j] <- ((x[-i,j]-mean(x[-i,j]))/sd(x[-i,j])) #and standardizes the p variables with the ith row missing in each n matrix (i increments from 1 to n)
}
}
I'm not sure if I can share the data, since it's related to grades from a class, but when I run the code it goes through the loop and stops by assigning a standardized matrix with the last row missing as x.standardized.i.
You can do this quite simply with sapply
and scale
:
# Create dummy data
m <- matrix(runif(200), ncol=10)
# Leave each row out in turn, and scale each column
A <- sapply(seq_len(nrow(m)), function(i) scale(m[-i, ]), simplify='array')
By default, scale
centres each column on its mean, and divides by its sd.
For the example above, you'll end up with an array with 19 rows, 10 columns and 20 slices.
To access particular slices (i.e. cross-validation training folds), you can subset like this:
A[,, 1] # all rows, all cols, first slice
A[,, 10] # all rows, all cols, tenth slice
To confirm that columns are centred on their mean and standardised by one sd:
apply(A, c(2, 3), mean)
apply(A, c(2, 3), sd)