Search code examples
rif-statementdplyrslicegrepl

Conditional filtering using grepl and relative row position in group


I have a dataset similar to the following:

Journal_ref <- c("1111","2222","2222","2222","3333","3333","4444","4444")
Journal_type <- c("Adj","Rev","Adj","Rev","Rev","Rev","Adj","Adj")
Journal_value <- c(90,10000,12000,80,9000,500,65,2500)
Dataset <- data.frame(Journal_ref,Journal_type,Journal_value)

For each Journal_ref group I am seeking to filter/select rows based on the following conditions:

  • Where "Adj" is included within Journal_type, filter/select to return the last "Adj" row in the Journal_ref group, and
  • Where "Adj" is not included within Journal_type, filter/select to return the last "Rev" in the Journal_ref group

Based on the example above, the final output required would be:

Journal_ref Journal_type Journal_value
1111        Adj                    90
2222        Adj                 12000
3333        Rev                   500
4444        Adj                  2500

I have attempted using various combinations of group_by, filter, if, ifelse, grepl, select and slice with no success.

Any help would be appreciated, particularly using dplyr.


Solution

  • Try this:

    library(dplyr)
    
    Dataset %>%
      group_by(Journal_ref, Journal_type) %>%
      summarise(Journal_value = last(Journal_value)) %>%
      ungroup() %>% group_by(Journal_ref) %>%
      filter(!(n() > 1 & Journal_type == "Rev"))
    

    Output:

      Journal_ref Journal_type Journal_value
      <fct>       <fct>                <dbl>
    1 1111        Adj                     90
    2 2222        Adj                  12000
    3 3333        Rev                    500
    4 4444        Adj                   2500