Search code examples
rstatisticsrows

How to eleminate variables based on comparison of specific rows


I have data that contains twelve rows and more than 500 variables I want to keep only the variables that have value of line 9 > 5* value of line 10

Example of data:

       Name   ClassType    Col1   Col2   Col3     
       ---------------------------------------
        A      Class1       10     50    12        
        B      Class2        7     20    12
        C      Class1        8     12     8
        D      Class1        9     14    17
        E      Class2        3     15    14
        F      Class2       10     15    16
        G      Class2       12     22    15
        H      Class1       10     28    10
        I       NA          50     10    30
        J       NA           8      5     2

Result I want: delete of column 2 because the value of line 9 in that column is < 5* value of line 10 of the same column:

      Name   ClassType    Col1   Col3     
      -------------------------------
        A      Class1       10    12        
        B      Class2        7    12
        C      Class1        8     8
        D      Class1        9    17
        E      Class2        3    14
        F      Class2       10    16
        G      Class2       12    15
        H      Class1       10    10
        I       NA          50    30
        J       NA           8     2

I tried if condition but it didn't give me good results, but I want to know if there's any other way.

The code i tried

data_4 <- as.data.frame(data_3[,1, drop=FALSE])


for (i in 2:640) {
  a = as.numeric(data_3[9,i])
  b = as.numeric(data_3[10,i])
  print(b)
  c = as.numeric(b*5)
  
  if(a > c) {
    data_4 <- cbind(data_4[, , drop=FALSE], data_3[ ,i,drop=FALSE])
    
    
  }

Thank you


Solution

  • We may use select to select the character columns and the numeric columns where the condition matches - 9th element of the column is greater than 5 times the last value

    library(dplyr)
    df1 <- df1 %>% 
      dplyr::select(where(is.character),
           where(~ is.numeric(.x) && nth(., 9) >  5 * last(.) ))
    

    -output

    df1
        Name ClassType Col1 Col3
    1     A    Class1   10   12
    2     B    Class2    7   12
    3     C    Class1    8    8
    4     D    Class1    9   17
    5     E    Class2    3   14
    6     F    Class2   10   16
    7     G    Class2   12   15
    8     H    Class1   10   10
    9     I      <NA>   50   30
    10    J      <NA>    8    2
    

    data

    df1 <- structure(list(Name = c("A", "B", "C", "D", "E", "F", "G", "H", 
    "I", "J"), ClassType = c("Class1", "Class2", "Class1", "Class1", 
    "Class2", "Class2", "Class2", "Class1", NA, NA), Col1 = c(10L, 
    7L, 8L, 9L, 3L, 10L, 12L, 10L, 50L, 8L), Col2 = c(50L, 20L, 12L, 
    14L, 15L, 15L, 22L, 28L, 10L, 5L), Col3 = c(12L, 12L, 8L, 17L, 
    14L, 16L, 15L, 10L, 30L, 2L)), class = "data.frame", row.names = c(NA, 
    -10L))