I'm experiencing an awkward issue while trying to run a principal component analysis on my data. I've tried to useprcomp
(base) and rda
(vegan), but the analysis is considering columns as sample units instead of rows, which causes all sorts of issues with the analysis.
The following code is a simplification of my data. The actual dataset is composed of nearly 2000 columns and around 350 rows. However, the issue is the same when I run the script bellow:
rn <- rnorm(8000)
dt <- matrix(rn, nrow=80, ncol=1000)
result <- rda(dt, scale=T)
summary(result)
At first I thought this would be an common error, however I coudn't find any similar issues nor solutions to it.
Is there a way to clearly specify which dimension to use as sample units?
Whilst you can perform PCA on a data set with more variables, p, than observations, n, using the SVD method, there are at most n principal components, or n-1 if the data are centred.
If you dig into the results from the PCA you fitted, you'll see that it considered all variables and that they remained as variables:
> r2 <- rda(dt, scale=T)
> dim(scores(r2, display = 'species'))
[1] 1000 2
'species'
is vegan's way of referring to the variable loadings; there are 1000 variables.
Compare with prcomp()
, which also used SVD:
> r1 <- prcomp(dt, scale = TRUE)
> dim(scores(r1, display = 'species'))
[1] 1000 80
again 1000 variables, 80 principal components (the reason for 80 here, vs 2 earlier is just the default for choices
, i.e. which axes to extract scores for.)