I have this data.frame
of five possible character
states (genotypes
):
genotypes <- c("0/0","1/1","0/1","1/0","./.")
library(dplyr)
set.seed(1)
df <- do.call(rbind, lapply(1:100, function(i)
matrix(sample(genotypes, 30, replace = T), nrow = 1, dimnames = list(NULL, paste0("V", 1:30))))) %>%
data.frame()
And I wan to summarize each row into how many I have of each:
ref.hom
(0/0
)alt.hom
(1/1
)het
(0/1
or 1/0
)na
(./.
)This seems rather slow:
sum.df <- do.call(rbind,lapply(1:nrow(df), function(i){
data.frame(ref.hom = length(which(df[i,] == "0/0")),
alt.hom = length(which(df[i,] == "1/1")),
het = length(which(df[i,] == "0/1") | which(df[i,] == "1/0")),
na = length(which(df[i,] == "./.")))
}))
Any more efficient, perhaps dplyr
based way to do this?
With dplyr
, you can try:
df %>%
transmute(ref.hom = rowSums(. == "0/0"),
alt.hom = rowSums(. == "1/1"),
het = rowSums(. == "0/1") + rowSums(. == "1/0"),
na = rowSums(. == "./."))
ref.hom alt.hom het na
1 4 11 9 6
2 5 2 20 3
3 3 11 10 6
4 5 5 15 5
5 5 4 17 4
6 3 8 13 6
7 6 8 11 5
8 4 8 11 7
9 6 6 14 4
10 14 8 5 3