I am trying to run PCA on some bioclimatic variables in R, specifically, current and future projections of bioclimatic variables from worldclim.org.
The problem lies in that prcomp can only work with one rasterstack. I wanted for prcomp to work on two rasterstacks at the same time. Rasterstacks, which have the exact same set of variables (names and extent), but different cell values.
I do have a roundabout method to solve this, which is to shift the extent of the the future raster layers and merge them with the current ones into one extensive set of rasterlayer. But this gives me a lot of projection issues with other data that I wish to use alongside this.
I understand this is not entirely clear, but basically: PCA two rasterstacks with the same variables, with the same coordinates, without having to shift extent.
Thank you!
Edits: I could not figure out how to get the data or create example data, thus did not initially including example code. Will try to be clearer here.
filesC # location of bioclimate files for current
[1] "bio01.asc" "bio02.asc" "bio03.asc" "bio04.asc" "bio05.asc"
filesF # location of bioclimate files for the future
[1] "bio01.asc" "bio02.asc" "bio03.asc" "bio04.asc" "bio05.asc"
# note they have the exact same variables.
rasC <- stack(filesC)
rasF <- stack(filesF)
rasC@extent
#class : Extent
#xmin : 116.95
#xmax : 126.6
#ymin : 4.65
#ymax : 21.11667
rasF@extent
#class : Extent
#xmin : 116.95
#xmax : 126.6
#ymin : 4.65
#ymax : 21.11667
# and exact same extent
So currently I am doing this.
pcaC <- prcomp(rasC, scale = T)
FutPCs <- predict(rasF, pcaC)
# creating PCs of rasF based on the pca from rasC
However, I would like to apply the PCA on both rasterstacks simultaneously. Building a "PCA formula" based on the variables from both current and future bioclimatic variables. like so...
pca <- prcomp(rasC, rasF, scale = T)
CurPCs <- predict(rasC, pca)
FutPCs <- predict(rasF, pca)
Hope this is clearer!
To people who come across this post with similar issues. I have found a simple answer to this problem.
prcomp
Does not work directly on the rasters themselves, but rather with a matrix of (usually random) points from the raster. To merge the analysis of PCA for the two rasterstacks, I merely included equal portions of points from each raster stack and binded them together.
rasC <- stack(rasC)
rasF <- stack(rasF)
srC <- sampleRandom(rasC, 10000)
srF <- sampleRandom(rasF, 10000)
srCF <- rbind(srC,srF)
pcaCF <- prcomp(srCF,scale=T)
From there, I can predict the new PCs based on the combined pca of the two datasets. Did not realise there was such a step to take, but at least I think it is solved!