I've got a matrix with 276 column samples and 215000 rows. The values range from zero to some non negative value.
I will use mtcars as simplified example
I need to filter out the rows where less than 50% (could be any percentage) of samples don't reach certain value, for example 1.
Example Matrix:
Tmtcars <- t(mtcars[1:5,c(2, 8:11)])
I need to select rows where at least 50% of cells are equal or greater than 1.
Only the row “vs = c(0,0,1,1,0)” does not met this condition. As only 2 cells (40%) are 1 or larger.
The row “am = c(1,1,1,0,0)” should be selected as 3 cells (60%) are equal or larger than 1.
If i run the rowMeans function
Filtered <- Tmtcars[(rowMeans(Tmtcars) >= 1 ) >=0.5, ]
the row "am" is not selected.
The selection criteria has to be whether 50% of cells meet a criteria, nothing to do with average.
Thanks!
You can use rowSums()
:
set.seed(1)
sample1 <- c(sample(1:10, 5))
sample2 <- c(sample(1:10, 5))
sample3 <- c(sample(1:10, 5))
sample4 <- c(sample(1:10, 5))
sample5 <- c(sample(1:10, 5))
df <- data.frame(sample1, sample2, sample3, sample4, sample5)
df2 <- df[rowSums(df > 2) > (ncol(df)/2),]
You can obviously play with the values. The first 2
is the value to compare with, the second 2
is looking for the rows where the comparison match is more than 50%.