Search code examples
rdata.tabler-factor

Order factor levels in order of appearance in data set


I have a survey in which a unique ID must be assigned to questions. Some questions appear multiple times. This means that there is an extra layer of questions. In the sample data below only the first layer is included.

Question: how do I assign a unique index by order of appearance? The solution provided here works alphabetically. I can order the factors, but this defeats the purpose of doing it in R [there are many questions to sort].

library(data.table)
dt = data.table(question = c("C", "C", "A", "B", "B", "D"), 
                value = c(10,20,30,40,20,30))

dt[, idx := as.numeric(as.factor(question))]

gives:

  question value idx
# 1:        C    10   3
# 2:        C    20   3
# 3:        A    30   1
# 4:        B    40   2
# 5:        B    20   2
# 6:        D    30   4

# but required is:
dt[, idx.required := c(1, 1, 2, 3, 3, 4)]

Solution

  • I think the data.table way to do this will be

    dt[, idx := .GRP, by = question]
    
    ##    question value idx
    ## 1:        C    10   1
    ## 2:        C    20   1
    ## 3:        A    30   2
    ## 4:        B    40   3
    ## 5:        B    20   3
    ## 6:        D    30   4