Search code examples
indexingdplyrcounteridentifier

R: Index by group and year with similar number for same group and year


I have a data frame that shows the births of children by parent and year. What I need is a new column that indexes the order of birth per child, but provides the same number if two children from the same parent were born in the same year. The data look as follows:

>df1
id       year  name
parent1  2001  bas
parent1  2002  jack
parent2  1991  david
parent3  1993  daniel
parent3  1993  jasper
parent3  1994  melany
parent4  1997  john
parent4  1999  gerard

Thus far, I created an index for child per year per parent, but children get ascending numbers even if they are born in the same year. I used the following code:

> df2 <- df1 %>% group_by(id) %>% mutate(order = row_number(year))
> df2
id       year  name   order
parent1  2001  bas    1
parent1  2002  jack   2
parent2  1991  david  1
parent3  1993  daniel 1
parent3  1993  jasper 2
parent3  1994  melany 3
parent4  1997  john   4
parent4  1999  gerard 1

What I want to arrive at, however, is (note the double "1" for parent3 in year 1993):

id       year  name   order
parent1  2001  bas    1
parent1  2002  jack   2
parent2  1991  david  1
parent3  1993  daniel 1
parent3  1993  jasper 1
parent3  1994  melany 2
parent4  1997  john   3
parent4  1999  gerard 1

Do you have a solution for this problem? Is there a slight tweak for my code that solves this? Thanks in advance!


Solution

  • You can use dense_rank for row_number:

    df %>% group_by(id) %>% mutate(order = dense_rank(year))
    
    # A tibble: 8 x 4
    # Groups:   id [4]
    #       id  year   name order
    #   <fctr> <int> <fctr> <int>
    #1 parent1  2001    bas     1
    #2 parent1  2002   jack     2
    #3 parent2  1991  david     1
    #4 parent3  1993 daniel     1
    #5 parent3  1993 jasper     1
    #6 parent3  1994 melany     2
    #7 parent4  1997   john     1
    #8 parent4  1999 gerard     2