Search code examples
rcounting

Counting occurrences and occurrences which do not appear


I have a dataframe which looks like this:

head(df)

id    id_child
1       1
1       2
1       3
2       1
4       1
4       2 

I would like to create a variable which counts the number of children per parent. So I would like something like this:

head(nb_chilren)

id    id_child      
1       3
2       1
3       0
4       2

If possible, I would like that the person 3 is indicated as having 0 child even though she does not exist in the first frame.

Note: ids are sequential, in real data they are 1 to 10628.

Any suggestions? I suppose I must use the split() function, but I really do not know how to use it.


Solution

  • One dplyr option could be:

    df %>%
     group_by(id = factor(id, levels = min(id):max(id)), .drop = FALSE) %>%
     summarise(id_child = n_distinct(id_child))
    
      id    id_child
      <fct>    <int>
    1 1            3
    2 2            1
    3 3            0
    4 4            2