I have a data frame where rows are years (~30 years) and sites (~300 sites) and columns are species abundances (~200 spp).
The data looks like:
Site | Year | Specie 1 | Specie 2 | Species n |
---|---|---|---|---|
Site 1 | Year 1 | 0.4 | 0.0 | 0.0 |
Site 1 | Year 2 | 0.8 | 0.5 | 1.0 |
Site 1 | Year 3 | 0.0 | 0.7 | 1.3 |
Site 1 | Year 4 | 0.0 | 0.4 | 1.6 |
Site 2 | Year 1 | 1.2 | 0.1 | 0.4 |
Site 2 | Year 2 | 1.0 | 0.0 | 0.5 |
Site 2 | Year 3 | 1.0 | 0.0 | 0.4 |
Site 3 | Year 1 | 2.0 | 0.0 | 1.0 |
Site 3 | Year 2 | 1.0 | 0.0 | 0.8 |
Site 3 | Year 3 | 0.5 | 0.0 | 1.0 |
Site 3 | Year 4 | 0.0 | 0.0 | 0.3 |
*Note the dataset has inconsistent year surveys.
What I need is to build a nested loop of time series (years) inside a site loop. Then, run temporal pairwise comparisons (e.g., using betapart or vegan packages in R) by two different scenarios: 1) anchoring the first year (year 1) of each site and comparing the subsequent years (i.e., yr1 vs yr2; yr1 vs yr3; yr1 vs yr4...), and 2) comparing dissimilarities between successive years (i.e., yr1 vs yr2; yr2 vs yr3; yr3 vs yr4...).
I do not have any final code but think I need to run something like this (I am sure is very much more complex than this):
# prior need to define empty matrix to beta outputs
beta.matrix <- data.frame(curp = df$Site, Year = df$Year, beta.bray.bal=NA, beta.bray.gra=NA, beta.bray=NA)
# nested loop pairwise dissimilarities
for(i in 1:unique(df$Site)){ # outer loop are sites
for(j in 1:(nrow(df$Year))){ # the inner loop are years and will run one time for each iteration of the outer loop.
beta.x <- beta.pair.abund(x, index.family="bray") # this is the pairwise function in betapart package in R
#the function *beta.pair.abund* returns three outputs that I expect to store in the empty matrix I defined previusly: https://cran.r-project.org/web/packages/betapart/betapart.pdf
beta.matrix$beta.bray.bal[i]<-as.matrix(beta.x[[1]])[unique(df$Site)+i, i]
beta.matrix$beta.bray.gra[i]<-as.matrix(beta.x[[2]])[unique(df$Site)+i, i]
beta.matrix$beta.bray[i]<-as.matrix(beta.x[[3]])[unique(df$Site)+i, i]
}
}
Any thoughts on how to do it will be very well appreciated.
This solution is based on the sample data you provided. Some modifications to the code will be required for it to work on your full dataset:
library(betapart)
df <- data.frame(Site = paste(rep("Site", 11), c(1,1,1,1,2,2,2,3,3,3,3)),
Year = paste(rep("Year", 11), c(1,2,3,4,1,2,3,1,2,3,4)),
`Specie 1` = c(0.4, 0.8, 0.0, 0.0, 1.2, 1.0, 1.0, 2.0, 1.0, 0.5, 0.0),
`Specie 2` = c(0.0, 0.5, 0.7, 0.4, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0),
`Specie 3` = c(0.0, 1.0, 1.3, 1.6, 0.4, 0.5, 0.4, 1.0, 0.8, 1.0, 0.3),
check.names=FALSE)
# Define empty matrix for beta outputs (note this differs from
# your example as it has three added columns to accommodate
# both "subsequent" and "successive" outputs)
beta.matrix <- data.frame(curp = df$Site,
Year = df$Year,
subs.beta.bray.bal=NA,
subs.beta.bray.gra=NA,
subs.beta.bray=NA,
succ.beta.bray.bal=NA,
succ.beta.bray.gra=NA,
succ.beta.bray=NA)
# Create loop vector for every Site
for(i in unique(df$Site)) {
# Get row index for first value of each group e.g. "Year 1"
x <- min(which(df$Site == i))
# Counter, only needed for calculating "successive years"
y <- 0
# Create loop vector one less than site observations
# e.g. exclude "Year 1"
for(j in 1:(length(which(df$Site == i))-1)) {
# Calculate beta.pair.abund for x and x+j (subsequent years)
# ATTENTION: edit column index "3:5" to match your actual data
# e.g. df[c(x,x+j),3:200]
beta.subs <- beta.pair.abund(df[c(x,x+j),3:5], index.family="bray")
# Add result to beta.matrix (leave the column index as is)
beta.matrix[x+j,3:5] <- do.call(cbind, beta.subs)
# Calculate beta.pair.abund for x+y and x+j (successive years)
# ATTENTION: edit column index "3:5" to match your actual data
# e.g. df[c(x+y, x+j),3:200]
beta.succ <- beta.pair.abund(df[c(x+y, x+j),3:5], index.family="bray")
# Add result to beta.matrix
beta.matrix[x+j,6:8] <- do.call(cbind, beta.succ)
# Increase "counter" by 1 for each inner loop iteration
y <- y + 1
}
}
beta.matrix
# curp Year subs.beta.bray.bal subs.beta.bray.gra subs.beta.bray succ.beta.bray.bal succ.beta.bray.gra succ.beta.bray
# 1 Site 1 Year 1 NA NA NA NA NA NA
# 2 Site 1 Year 2 0.00000000 0.70370370 0.70370370 0.00000000 0.70370370 0.70370370
# 3 Site 1 Year 3 1.00000000 0.00000000 1.00000000 0.25000000 0.05232558 0.30232558
# 4 Site 1 Year 4 1.00000000 0.00000000 1.00000000 0.15000000 0.00000000 0.15000000
# 5 Site 2 Year 1 NA NA NA NA NA NA
# 6 Site 2 Year 2 0.06666667 0.05833333 0.12500000 0.06666667 0.05833333 0.12500000
# 7 Site 2 Year 3 0.00000000 0.09677419 0.09677419 0.00000000 0.03448276 0.03448276
# 8 Site 3 Year 1 NA NA NA NA NA NA
# 9 Site 3 Year 2 0.00000000 0.25000000 0.25000000 0.00000000 0.25000000 0.25000000
# 10 Site 3 Year 3 0.00000000 0.33333333 0.33333333 0.13333333 0.07878788 0.21212121
# 11 Site 3 Year 4 0.00000000 0.81818182 0.81818182 0.00000000 0.66666667 0.66666667
Note that this will return NAs for "Year 1" for each group. This is expected behaviour.
One further point, there are better approaches than loops in R. Although your data has only ~9,000 rows and this loop should take less than one minute, R handles vectorised methods much better. Check out the tidyverse for faster, non-loop alternatives using packages such as dplyr
and tidyr
.
UPDATE:
In response to comments below, here is a loop to compare across sites by year using the same sample df as before. It involves a slightly different "beta.matrix1" df to hold the results. It also cycles through only those sites that have a corresponding "Year" e.g. in the example data, as "Site 2" has no record for "Year 4", "Site 1" only matches "Year 4" with "Site 3" etc. This solves the issue of sites with differing observation counts. This is achieved via using which()
to create the "y" vector in the following loop:
# Create empty df with 0 rows to hold results
beta.matrix1 <- data.frame(matrix(nrow = 0, ncol = 6))
# Rename columns. "Site" and "SiteN" are the pairs being matched
colnames(beta.matrix1) <- c("Site", "SiteN", "Year",
"year.beta.bray.bal",
"year.beta.bray.gra",
"year.beta.bray")
# Loop through vector of every unique site
for(i in unique(df$Site)) {
# Loop through vector of row indices for i e.g. every year
for(j in which(df$Site == i)) {
# Create vector of row indices from every other "Site" for the current
# "Year" being processed in i e.g. If "Site 1 Year 1" is the current
# value, then get every "Site n Year 1" EXCLUDING "Site 1 Year 1"
y <- which(df$Year == df$Year[j] & df$Site != i)
# Loop through every Site that has matching Year
for(k in y) {
# If a pair has already been processed...
if(paste(df$Site[k], i) %in% paste(beta.matrix1$Site, beta.matrix1$SiteN)) {
# skip to the next iteration
next
} else {
# Calculate beta.pair.abund for j and k (Site comparison).
# ATTENTION: edit column index "3:5" to match your actual data
# Note this took two steps in previous loop
beta.year <- do.call(cbind, beta.pair.abund(df[c(j,k),3:5],
index.family="bray"))
# Create list with current Sites, Year, and beta.pair.abund result
z <- c(i, df$Site[k], df$Year[j], beta.year)
# Add result to beta.matrix1
beta.matrix1[nrow(beta.matrix1)+1,] <- z
}
}
}
}
# Convert beta.year results back to numeric
beta.matrix1[,4:6] <- as.numeric(unlist(beta.matrix1[,4:6]))
beta.matrix1
# Site SiteN Year year.beta.bray.bal year.beta.bray.gra year.beta.bray
# 1 Site 1 Site 2 Year 1 0.00000000 0.61904762 0.61904762
# 2 Site 1 Site 3 Year 1 0.00000000 0.76470588 0.76470588
# 3 Site 1 Site 2 Year 2 0.13333333 0.18245614 0.31578947
# 4 Site 1 Site 3 Year 2 0.11111111 0.10840108 0.21951220
# 5 Site 1 Site 2 Year 3 0.71428571 0.05042017 0.76470588
# 6 Site 1 Site 3 Year 3 0.33333333 0.09523810 0.42857143
# 7 Site 1 Site 3 Year 4 0.00000000 0.73913043 0.73913043
# 8 Site 2 Site 3 Year 1 0.05882353 0.26032541 0.31914894
# 9 Site 2 Site 3 Year 2 0.00000000 0.09090909 0.09090909
# 10 Site 2 Site 3 Year 3 0.35714286 0.02216749 0.37931034
Note that this will ignore pairs that have already been processed e.g. if "Site 1 Year 1 / Site 2 Year 1" exists, then "Site 2 Year 1 / Site 1 Year 1" will not be added. This ensures only unique pairs are recorded. Further, it's not ideal that the beta.pair.abund()
results have to be coerced back to numeric and other approaches would avoid this. However, as it stands, the results seem robust.