Search code examples
rdataframesubsetmultiple-columns

r subset by multiple columns


I am bit confused about the logic to subset the dataset based on specific conditions for multiple columns.

For example if this is my dataset

ID   Sex Age  Score
1    M   4.2  19
1    M   4.8  21
2    F   6.1  23
2    F   6.7  45
3    F   9.4  39
4    M   8    33
5    M   10   56

The acceptable range of Score for Gender=Male between Age(between, 6 to 11) is Score(between, 34 to 100) .

The final dataset would be, without ID 4

ID   Sex Age  Score
1    M   4.2  19
1    M   4.8  21
2    F   6.1  23
2    F   6.7  45
3    F   9.4  39
5    M   10   56

I tried this approach,

Df0 <- subset( Df0, (between(Age, 6,11)&
                     Sex == "M"&
                     between(Score, 34, 100))

And this did not work. Any suggestions are much appreciated. Thanks in advance.


Solution

  • If I understand your explanation correctly along with the expected output shown you are looking for something like this -

    library(dplyr)
    
    df %>%
      group_by(ID) %>%
      filter(ifelse(Sex == 'M' & between(Age, 6,11), 
              between(Score, 34, 100), TRUE)) %>%
      ungroup
    
    #     ID Sex     Age Score
    #  <int> <chr> <dbl> <int>
    #1     1 M       4.2    19
    #2     1 M       4.8    21
    #3     2 F       6.1    23
    #4     2 F       6.7    45
    #5     3 F       9.4    39
    #6     5 M      10      56
    

    between(Score, 34, 100) is only checked when the Sex is 'M' and Age is between 6 and 11.