Search code examples
rdataframedplyrcasemutate

Categorize down a column based on conditions


I have a dataframe that looks like the following:

group1 <- c('A','A','A','A',
            'B','B','B','B',
            'C','C','C','C')

group2 <- c(1, 2, 3, 4,
            1, 2, 3, 4,
            1, 2, 3, 4)


indicator <- c(1, 1, 1, 1,
               1, 1, NA, 1,
               NA, 1, 1, NA)


df <- data.frame(group1, group2, indicator)

I want to create a new column that evaluates the value of indicator for each group2 value within a value of group1 by the following logic:

(1) First value of 1 = "New"
(2) First value of 1 after NA = "Return"
(3) All other values of 1 = "Normal"
(4) All NAs = "None"

The resulting dataframe would like the following:

group1   group2    indicator    category
A        1         1            New
A        2         1            Normal
A        3         1            Normal
A        4         1            Normal
B        1         1            New
B        2         1            Normal
B        3         NA           None
B        4         1            Return
C        1         NA           None
C        2         1            New
C        3         1            Normal
C        4         NA           None

The evaluating down a column part of this is tripping me up. How should I go about producing the desired dataframe?


Solution

  • library(dplyr)
    
    df %>% 
      
      mutate(category = ifelse(is.na(indicator), "None", "Normal"),
             
             category = case_when(cumsum(category != "None") == 1 ~ "New",
                                  lag(category) == "None" ~ "Return",
                                  .default = category),
    
             .by = group1)
    
    #>    group1 group2 indicator category
    #> 1       A      1         1      New
    #> 2       A      2         1   Normal
    #> 3       A      3         1   Normal
    #> 4       A      4         1   Normal
    #> 5       B      1         1      New
    #> 6       B      2         1   Normal
    #> 7       B      3        NA     None
    #> 8       B      4         1   Return
    #> 9       C      1        NA     None
    #> 10      C      2         1      New
    #> 11      C      3         1   Normal
    #> 12      C      4        NA     None
    

    Created on 2023-12-06 with reprex v2.0.2