I have a dataframe that resembles the following:
ID | X | Y | A_1_l | A_2_m | B_1_n | B_2_l | C_1_m | C_2_n | C_3_l |
---|---|---|---|---|---|---|---|---|---|
w | X | Y | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
x | X | Y | 0 | 0 | 3 | 0 | 0 | 0 | 0 |
y | X | Y | 0 | 1 | 0 | 4 | 0 | 1 | 0 |
z | X | Y | 3 | 4 | 5 | 6 | 2 | 1 | 5 |
The first letter denotes a sample, the number a repetition and the second letter a batch. I am trying to find a count of the number of samples with at least one value > 0 for each ID and store these numbers in a list.
This is the desired result as a list that I can append to a an existing dataframe:
0,1,3,3
For a previous analysis I used strsplit
to count the total number of samples per batch.
colsList <- colnames(df)
cols <- grep("_", colsList, value=TRUE)
splitList <- strsplit(cols, "_\\d_")
stats <-data.frame(t(as.data.frame.list(splitList)))
rownames(stats)<-NULL
names(stats)<-c("Sample", "Batch")
perSample <- aggregate(Sample ~ Batch, stats,
function(x) length(unique(x))) # number of strains
And I was able to find the total number of columns with a value > 0 using rowSums(df[sapply(df, is.numeric)] > 0)
but I cant seem to figure out how to combine the two to find the total number of samples > 0
First filter the data to keep only the numeric columns.
Use split.default
to divide the data into groups so that you have all the 'A'
columns in one group, 'B'
in another and so on. Within each group return TRUE
if a row has a single value which is greater than 0, sum
all the values together from all the groups to get final count.
tmp <- Filter(is.numeric, df)
rowSums(sapply(split.default(tmp, sub('_.*', '', names(tmp))),
function(x) rowSums(x) > 0))
#[1] 0 1 3 3