I have a data frame like this:
df<- data.frame(year= c(rep("2004", 10), rep("2005", 10), rep("2006", 10), rep("2007", 10)),
lev1=c("A", "B", "C", "A", "D", "E", "D", "D", "B","B","C", "A","F","E","A","B",
"A", "B","C", "A", "D", "E", "D", "D", "B","B","C", "A","F","E","A", "B", "C", "A", "D","A","F","E","A","B" ),
lev2=c("X", "Y", "Z", "X", "W", "T", "W", "W", "Y","Y","Z", "T","U","V","Y","Y",
"W", "X","T", "W", "X", "Y", "Z", "X", "W", "T", "W", "W", "Y","Y","Z", "T","U","V","Y","Y",
"W", "X","T", "W"))
And have code to make a list of matrices (Results
) for each year. lev1
becomes the rows and lev2
becomes the columns. Values inside the matrix is the quantity of times the two co-occur.
sublist=NA
for (i in unique(df$year)){
sublist[i]<-list(subset(df, df[,1] == i))
print(i)
}
Results = list()
for (i in 1: length(unique(sublist))){
if (length(sublist[[i]]) > 1 & length(sublist[[i]]) > 1 ){
rows<-unique(sublist[[i]][[2]])
cols<-unique(sublist[[i]][[3]])
matrix1<- matrix(nrow = length(rows), ncol = length(cols))
df = data.frame(sublist[[i]])
for (k in 1: length(rows)){
sub_lev1<- subset(df,lev1 == rows[k])
for (j in 1:length(cols)){
sub_lev2<-subset(sub_lev1, lev2 == cols[j])
matrix1[k,j]<-length(sub_lev2[,3])
}
}
colnames(matrix1) <- cols
rownames(matrix1) <- rows
Results[[i]] = matrix1
}else{next}
}
Results
I would like to run a singe function (library("bipartite") networklevel()
) on each element of the list that returns multiple values for multiple network indices. Below I do it individually for each matrix.
d1<-networklevel(Results[[2]])
d2<-networklevel(Results[[3]])
d3<-networklevel(Results[[4]])
d4<-networklevel(Results[[5]])
The output desired is a data frame that includes the year, name of the network index, and the value for each network index:
d1<-data.frame(as.list(d1))
d1<- melt(d1)
d1$year<-rep("2004", length(d1))
d2<-data.frame(as.list(d2))
d2<- melt(d2)
d2$year<-rep("2005", length(d2))
d3<-data.frame(as.list(d3))
d3<- melt(d3)
d3$year<-rep("2006", length(d3))
d4<-data.frame(as.list(d4))
d4<- melt(d4)
d4$year<-rep("2007", length(d4))
output<- rbind(d1,d2,d3, d4)
A few problems I have: 1) for some reason the loop above returns the first matrix as NULL
. How do I correct this? 2) When the matrices are indexed in Results
they are not indexed by year
, rather 1-5. I would like to adjust the loop so that the name of the year is indexed. I believe this would facilitate creating the output df downstream.
I have tried the following to return network indices for each element of the list with out success:
output<- lapply(mylist, FUN= function(x) networklevel(x)
I would appreciate any help running networklevel
on all elements of the list at one time. The default of networklevel
is to return multiple network indices, so I need a solution to run networklevel
and return all those indices for each matrix into an organized data frame that specifies the year from which the matrix came. In my actual dataset I have over 20 years of data so it would be most efficient to find a solution that prevents me from doing this for each year/matrix separately.
Your first problem:
1) for some reason the loop above returns the first matrix as NULL. How do I correct this?
change sublist <- NA
to sublist <- NULL
, the NA will not get removed from the object sublist
when you run your for loop and that is what is causing the first matrix to be NULL
. R trieds to subset where year == NA and this will not work.
Second issue:
2) When the matrices are indexed in Results they are not indexed by year, rather 1-5. I would like to adjust the loop so that the name of the year is indexed.
I would try something like this names(Results) <- c("2004", "2005", "2006", "2007")
Third Issue:
looping output
In your lapply you do not need to create a function(x)
just simply call networklevel
like this
output <- lapply(Results, bipartite::networklevel)
Then you can do something like this to get it into a df/matrix:
#get to matrix
dfoutput <- do.call(rbind, output)
#add row names as variable - in your case it is year of analysis
dfoutput2 <- cbind(dfoutput, nms = row.names(dfoutput))
#convert to df if needed
dfoutput3 <- as.data.frame(dfoutput2)