Search code examples
rdplyrdummy-variable

Dealing with ties using rank (R)


I'm trying to create dummy variable for whether a child is first born, and one for if the child is second born. My data looks something like this

ID   MID   CMOB   CYRB      
1    1     1      1991
2    1     7      1989
3    2     1      1985
4    2     11     1985
5    2     9      1994
6    3     4      1992
7    4     2      1992
8    4     10     1983

With ID = child ID, MID = mother ID, CMOB = month of birth and CYRB = year of birth.

For the first born dummy I tried using this:

Identifiers_age <- Identifiers_age %>% group_by(MPUBID) 
                          %>% mutate(first = as.numeric(rank(CYRB) == 1))

But there doesn't seem to be a way of breaking ties by the rank of another columnn (clearly in this case the desired column being CMOB), whenever I try using the "ties.method" argument it tell me the input must be a character vector.

Am I missing something here?


Solution

  • order might be more convenient to use here, from ?order:

    order returns a permutation which rearranges its first argument into ascending or descending order, breaking ties by further arguments.

    Identifiers_age <- Identifiers_age %>% group_by(MID) %>% 
                       mutate(first = as.numeric(order(CYRB, CMOB) == 1))
    Identifiers_age
    
    #Source: local data frame [8 x 5]
    #Groups: MID [4]
    
    #     ID   MID  CMOB  CYRB first
    #  <int> <int> <int> <int> <dbl>
    #1     1     1     1  1991     0
    #2     2     1     7  1989     1
    #3     3     2     1  1985     1
    #4     4     2    11  1985     0
    #5     5     2     9  1994     0
    #6     6     3     4  1992     1
    #7     7     4     2  1992     0
    #8     8     4    10  1983     1