Here is some dummy data:
class<-c("ab","ab","ad","ab","ab","ad","ab","ab","ad","ab","ad","ab","av")
otu<-c("ab","ac","ad","ab","ac","ad","ab","ac","ad","ab","ad","ac","av")
value<-c(0,1,12,13,300,1,2,3,4,0,0,2,4)
type<-c("b","c","d","a","b","c","d","d","d","c","b","a","a")
location<-c("b","c","d","a","b","d","d","d","d","c","b","a","a")
datafr1<-data.frame(class,otu,value,type,location)
I want to get rid of any OTU if any replicate within the group 'location' and 'type' is 0, because I am interested in OTUs shared between all the replicates within a group.
I want to calculate two things. One: percent abundance of 'value' for all OTUs shared between the group 'location' and type' (abundance) Two: count the number of OTUs shared in each class (otu.freq)
Caveat is that I would like the OTUs to classified by 'class', and not the OTU name (because it s meaningless).
Expected output:
class location type abundance otu.freq
ab a a 79 2
av a a 21 1
ab b b 100 1
ab c c 100 1
ad d c 100 1
ab d d 24 2
ad d d 76 2
I have a much larger data frame and have tried the suggestions using dplyr here but I ran out of RAM, so I don't know if it worked or not.
The solution provided by @Akron below does not count the occurrences where the abundance is 0, but it does not get rid of that OTU from the other replicates within that group. If any OTU has an abundance of 0, then it is not shared between that group and I need to completely discount it from the abundance and otu.freq calculation.
library(dplyr)
so_many_shared3<-datafr1 %>%
group_by(class, location, type) %>%
summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=sum(value !=0))
class location type abundance otu.freq
1 ab a a 4.3859649 2
2 ab b b 87.7192982 1
3 ab c c 0.2923977 1
4 ab d d 1.4619883 2
5 ad b b 0.0000000 0
6 ad d c 0.2923977 1
7 ad d d 4.6783626 2
8 av a a 1.1695906 1
You could do this in one step using either data.table
library(data.table)
val = sum(datafr1$value)
setDT(datafr1)[order(class,type), list(abundance =
sum(value)/val*100, otu.freq = .N),
by = .(class, location, type)]
Or using dplyr
library(dplyr)
datafr1 %>%
group_by(class, location, type) %>%
summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=n())
# class location type abundance otu.freq
#1 ab a a 4.3859649 2
#2 ab b b 87.7192982 2
#3 ab c c 0.2923977 2
#4 ab d d 1.4619883 2
#5 ad b b 0.0000000 1
#6 ad d c 0.2923977 1
#7 ad d d 4.6783626 2
#8 av a a 1.1695906 1
Based on the new criteria, I am updating the code suggested by the OP (@K.Brannen)
datafr1 %>%
group_by(class, location, type) %>%
summarise(abundance=sum(value)/sum(datafr1[['value']])*100,
otu.freq=sum(value !=0))
Based on the updated expected result
datafr1 %>%
filter(value!=0) %>%
group_by(location, type) %>%
mutate(value1=sum(value)) %>%
group_by(class, add=TRUE) %>%
summarise(abundance=round(100*sum(value)/unique(value1)),
otu.freq=n())
# location type class abundance otu.freq
#1 a a ab 79 2
#2 a a av 21 1
#3 b b ab 100 1
#4 c c ab 100 1
#5 d c ad 100 1
#6 d d ab 24 2
#7 d d ad 76 2