I have a dataframe which looks like this:
head(df)
id id_child
1 1
1 2
1 3
2 1
4 1
4 2
I would like to create a variable which counts the number of children per parent. So I would like something like this:
head(nb_chilren)
id id_child
1 3
2 1
3 0
4 2
If possible, I would like that the person 3 is indicated as having 0 child even though she does not exist in the first frame.
Note: ids are sequential, in real data they are 1 to 10628.
Any suggestions? I suppose I must use the split()
function, but I really do not know how to use it.
One dplyr
option could be:
df %>%
group_by(id = factor(id, levels = min(id):max(id)), .drop = FALSE) %>%
summarise(id_child = n_distinct(id_child))
id id_child
<fct> <int>
1 1 3
2 2 1
3 3 0
4 4 2