Search code examples
rdataframecell

Filtering rows in R when less than half rowcells meet a condition


I've got a matrix with 276 column samples and 215000 rows. The values range from zero to some non negative value.

I will use mtcars as simplified example

I need to filter out the rows where less than 50% (could be any percentage) of samples don't reach certain value, for example 1.

Example Matrix:

Tmtcars <- t(mtcars[1:5,c(2, 8:11)])

I need to select rows where at least 50% of cells are equal or greater than 1.

Only the row “vs = c(0,0,1,1,0)” does not met this condition. As only 2 cells (40%) are 1 or larger.

The row “am = c(1,1,1,0,0)” should be selected as 3 cells (60%) are equal or larger than 1.

If i run the rowMeans function

Filtered <- Tmtcars[(rowMeans(Tmtcars) >= 1 ) >=0.5, ]

the row "am" is not selected.

The selection criteria has to be whether 50% of cells meet a criteria, nothing to do with average.

Thanks!


Solution

  • You can use rowSums():

    set.seed(1)
    sample1 <- c(sample(1:10, 5))
    sample2 <- c(sample(1:10, 5))
    sample3 <- c(sample(1:10, 5))
    sample4 <- c(sample(1:10, 5))
    sample5 <- c(sample(1:10, 5))
    
    df <- data.frame(sample1, sample2, sample3, sample4, sample5)
    
    df2 <- df[rowSums(df > 2) > (ncol(df)/2),]
    

    You can obviously play with the values. The first 2 is the value to compare with, the second 2 is looking for the rows where the comparison match is more than 50%.