Search code examples
rdplyrtransformsequenceidentifier

Identifier as sequence starting with 1


I have an ID in my dataset indiciating the user whom the observation belongs to. I want to recode this as a sequence starting with 1.

Example data

da1 <- data.frame(player = c(120,120,120,47,47,18,18,18), wins = c(0,2,1,0,0,2,0,1))

da1
  player wins
1    120    0
2    120    2
3    120    1
4     47    0
5     47    0
6     18    2
7     18    0
8     18    1

I want it to look like this:

da2 <- data.frame(player = c(1,1,1,2,2,3,3,3), wins = c(0,2,1,0,0,2,0,1))

da2
  player wins
1      1    0
2      1    2
3      1    1
4      2    0
5      2    0
6      3    2
7      3    0
8      3    1

I have tried the following code, but it makes a sequence for every user.

library(tidyverse)
da1 %>%
  group_by(id) %>%
  mutate(start = 1:n())


Solution

  • I believe the tidyverse solution would be something similar to:

    da1$player <- 
      da1 %>% 
      group_by(player) %>% 
      group_indices()
    

    If you are willing to consider data.table and your data is ordered already you could do:

    da1$player <- data.table::rleid(da1$player)
    > da1
      player wins
    1      1    0
    2      1    2
    3      1    1
    4      2    0
    5      2    0
    6      3    2
    7      3    0
    8      3    1
    

    Or all-the-way data.table solution (not sensitive to ordering):

    setDT(da1)[, player := .GRP, by = player]
    da1
    

    Another base R alternative:

    as.integer(factor(-da1$player))