I have a data set where for each region
there is a value pos
, but some regions don't have a value of 1.
region <- c('A','A','B','B','C','D','D','E')
pos <- c(0,1,0,1,0,0,1,0)
df <- data.frame(region, pos)
For each region, I would like to select the row where the value of pos
is 1. If there is a region where there is no pos
value of 1, I would like to select the 0 row. So in essence I would like to have a resulting data set that has one row for each region but for the regions where this is a 1, I would like to select that row. There is never more than 2 rows for each region.
The result should look like this:
df <
Region pos
1 A 1
2 B 1
3 C 0
4 D 1
5 E 0
Thanks in advance for your help.
If you're certain all values are 0s and 1s, you might be able to get away with:
library(dplyr)
df %>%
slice_max(pos, by = region, with_ties = FALSE)
# region pos
# 1 A 1
# 2 B 1
# 3 C 0
# 4 D 1
# 5 E 0
(You can verify this returns at most one row per region by using bind_rows(df, df)
instead of df
, just to demonstrate.)
If that's not safe, though, then something like
df %>%
group_by(region) %>%
filter(row_number() == c(which(pos == 1), which(pos == 0))[1]) %>%
ungroup()
# # A tibble: 5 × 2
# region pos
# <chr> <dbl>
# 1 A 1
# 2 B 1
# 3 C 0
# 4 D 1
# 5 E 0
In this case, if a region has neither pos == 1
nor pos == 0
then the region will go away.