I need to flag an id
when they have different grade
values in the grade
columns. Here how my sample dataset looks like
df <- data.frame(id = c(11,22,33,44,55),
grade.1 = c(3,4,5,6,7),
grade.2 = c(3,4,5,NA,7),
grade.3 = c(4,4,6,5,7),
grade.4 = c(NA,NA,NA, 5, 7 ))
df$Grade <- paste0(df$grade.1, df$grade.2, df$grade.3, df$grade.4)
> df
id grade.1 grade.2 grade.3 grade.4 Grade
1 11 3 3 4 NA 334NA
2 22 4 4 4 NA 444NA
3 33 5 5 6 NA 556NA
4 44 6 NA 5 5 6NA55
5 55 7 7 7 7 7777
When an id
has different grade values in grade.1
grade.2
grade.3
and grade.4
, that row needs to be flagged. Having NA
in that column does not affect the flagging.
In other words, if the Grade
column at the end has any differential numbers, that id
needs to be flagged.
My desired output should look like this:
> df
id grade.1 grade.2 grade.3 grade.4 flag
1 11 3 3 4 NA flagged
2 22 4 4 4 NA Not_flagged
3 33 5 5 6 NA flagged
4 44 6 NA 5 5 flagged
5 55 7 7 7 7 Not_flagged
Any ideas? Thanks!
A base R solution using rle
omitting NA values.
df$flag <- apply(df[,2:5], 1, function(x)
ifelse(length(rle(x[!is.na(x)])$lengths)==1, "not_flagged", "flagged"))
df
id grade.1 grade.2 grade.3 grade.4 flag
1 11 3 3 4 NA flagged
2 22 4 4 4 NA not_flagged
3 33 5 5 6 NA flagged
4 44 6 NA 5 5 flagged
5 55 7 7 7 7 not_flagged
df <- structure(list(id = c(11, 22, 33, 44, 55), grade.1 = c(3, 4,
5, 6, 7), grade.2 = c(3, 4, 5, NA, 7), grade.3 = c(4, 4, 6, 5,
7), grade.4 = c(NA, NA, NA, 5, 7)), class = "data.frame", row.names = c(NA,
-5L))