I have a large data set divided into many small groups by a grouping variable = grp
; all members of a group are contiguous in the order of the larger data set. The members of a group each have an id code (= id
) and are numbered sequentially from 1. Within a group, some members meet a logical criterion = is_child
. Each member has a variable (momloc
) that contains either zero or the ID number of another of another group member (the mother if present).
I wish to assign to each individual in the data set the number of group members who has momloc equal to their ID, and zero if none do. I am trying to do this in dplyr as I have the groups set up there, and I have code that works, but it is a Rube Goldberg contraption of nested ifelse functions that adds two additional columns for intermediate values, one of which contains a vector, goes through the data set three times, and is incredibly slow. There has to be a better way than that. I'm getting tangled in the different syntax for mutate, working on rows, and summary, working on groups.
Below is a simplified data set and desired outcome
grp <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2)
id <- c(1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7)
is_child <- c(0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0)
momloc <- c(0, 0, 2, 2, 0, 0, 0, 3, 2, 2, 2)
data <- tibble(grp, id, is_child, momloc)
desired output:
out = c(0, 2, 0, 0, 0, 2, 1, 0, 0, 0, 0)
I propose a solution only using dplyr
.
First, I only keep children (assuming that you only want to count them as your out[6] = 2
instead of 3). Then, I create a frequency table of momloc
using count()
, and merge this to the original data.
data %>%
filter(is_child == 1) %>% # only count for children
group_by(grp) %>%
count(momloc) %>%
right_join(data, by = c("grp" = "grp", "momloc" = "id")) %>%
rename(
id = momloc,
momloc = momloc.y,
out = n
) %>%
mutate(out = ifelse(is.na(out), 0, out))
#> # A tibble: 11 x 5
#> # Groups: grp [2]
#> grp id out is_child momloc
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0 0 0
#> 2 1 2 2 0 0
#> 3 1 3 0 1 2
#> 4 1 4 0 1 2
#> 5 2 1 0 0 0
#> 6 2 2 2 0 0
#> 7 2 3 1 0 0
#> 8 2 4 0 1 3
#> 9 2 5 0 1 2
#> 10 2 6 0 1 2
#> 11 2 7 0 0 2