I have a data frame that shows the births of children by parent and year. What I need is a new column that indexes the order of birth per child, but provides the same number if two children from the same parent were born in the same year. The data look as follows:
>df1
id year name
parent1 2001 bas
parent1 2002 jack
parent2 1991 david
parent3 1993 daniel
parent3 1993 jasper
parent3 1994 melany
parent4 1997 john
parent4 1999 gerard
Thus far, I created an index for child per year per parent, but children get ascending numbers even if they are born in the same year. I used the following code:
> df2 <- df1 %>% group_by(id) %>% mutate(order = row_number(year))
> df2
id year name order
parent1 2001 bas 1
parent1 2002 jack 2
parent2 1991 david 1
parent3 1993 daniel 1
parent3 1993 jasper 2
parent3 1994 melany 3
parent4 1997 john 4
parent4 1999 gerard 1
What I want to arrive at, however, is (note the double "1" for parent3 in year 1993):
id year name order
parent1 2001 bas 1
parent1 2002 jack 2
parent2 1991 david 1
parent3 1993 daniel 1
parent3 1993 jasper 1
parent3 1994 melany 2
parent4 1997 john 3
parent4 1999 gerard 1
Do you have a solution for this problem? Is there a slight tweak for my code that solves this? Thanks in advance!
You can use dense_rank
for row_number
:
df %>% group_by(id) %>% mutate(order = dense_rank(year))
# A tibble: 8 x 4
# Groups: id [4]
# id year name order
# <fctr> <int> <fctr> <int>
#1 parent1 2001 bas 1
#2 parent1 2002 jack 2
#3 parent2 1991 david 1
#4 parent3 1993 daniel 1
#5 parent3 1993 jasper 1
#6 parent3 1994 melany 2
#7 parent4 1997 john 1
#8 parent4 1999 gerard 2