Search code examples
rfor-loopdelete-row

Delete rows satisfying condition for each column in R


I have a data frame (df) with numerical values. I would like to write a for loop that iterates through the columns. For each column, I want it to count the number of rows that have values above a number, say 3, then I want it to delete those rows entirely before moving to the next column.

This is what I tried so far:


output <- vector("double", ncol(df))
  for (i in 1:ncol(df)){
  output[[i]] <- length(which(df[i] >= 3))
  df <- df[!df[,i] >= 3, ]
}

But I get the following error:

Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn, : length of 'dimnames' [2] not equal to array extent


dput(head(df))

#output:
structure(list(col1 = numeric(0), col2 = numeric(0), (etc.)
NA. = integer(0)), row.names = integer(0), class = "data.frame")

  col1   col2   col3   col4     col5
1 2.09   1.10    0     21.03    0.88
3 0.00   0.00    0     11.71    0.00
4 1.50   1.10    0     1.67     1.76
5 5.10   0.00    0     0.83     17.94
6 0.00   6.34    0     2.10     0.00

In the example above, the final output I am interested in is a vector with the number of rows deleted per column: (1,1,0,2,0).


Solution

  • Here's a way with a for loop -

    dummy_df <- df # dummy_df in case you don't want to alter original df
    output <- rep(0, ncol(df)) # initialize output
    
    for(i in 1:ncol(df)) {
      if(nrow(dummy_df) == 0) break # loop breaks if all rows are removed
      if(!any(dummy_df >= 3)) break # loop breaks if no values >= 3 remain
      output[i] <- sum(dummy_df[i] >= 3)
      dummy_df <- dummy_df[dummy_df[i] < 3, , drop = F]
    }
    
    output
    [1] 3 0 1
    

    Another way with apply which is probably faster than above loop -

    # output excludes columns with 0 rows but can be added later if needed
    table(apply(df, 1, function(x) match(TRUE, x >= 3)))
    1 3 
    3 1
    

    Data (Thanks to @Sada93) -

      a  b c
    1 1  1 1
    2 2  2 5
    3 3  3 2
    4 4 10 1
    5 5  2 1