I have a dataframe that looks like the following:
group1 <- c('A','A','A','A',
'B','B','B','B',
'C','C','C','C')
group2 <- c(1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4)
indicator <- c(1, 1, 1, 1,
1, 1, NA, 1,
NA, 1, 1, NA)
df <- data.frame(group1, group2, indicator)
I want to create a new column that evaluates the value of indicator for each group2 value within a value of group1 by the following logic:
(1) First value of 1 = "New"
(2) First value of 1 after NA = "Return"
(3) All other values of 1 = "Normal"
(4) All NAs = "None"
The resulting dataframe would like the following:
group1 group2 indicator category
A 1 1 New
A 2 1 Normal
A 3 1 Normal
A 4 1 Normal
B 1 1 New
B 2 1 Normal
B 3 NA None
B 4 1 Return
C 1 NA None
C 2 1 New
C 3 1 Normal
C 4 NA None
The evaluating down a column part of this is tripping me up. How should I go about producing the desired dataframe?
library(dplyr)
df %>%
mutate(category = ifelse(is.na(indicator), "None", "Normal"),
category = case_when(cumsum(category != "None") == 1 ~ "New",
lag(category) == "None" ~ "Return",
.default = category),
.by = group1)
#> group1 group2 indicator category
#> 1 A 1 1 New
#> 2 A 2 1 Normal
#> 3 A 3 1 Normal
#> 4 A 4 1 Normal
#> 5 B 1 1 New
#> 6 B 2 1 Normal
#> 7 B 3 NA None
#> 8 B 4 1 Return
#> 9 C 1 NA None
#> 10 C 2 1 New
#> 11 C 3 1 Normal
#> 12 C 4 NA None
Created on 2023-12-06 with reprex v2.0.2