I have a data frame where one column is consisting of strings, which is a unique identifier to a journey. A reproducible data frame:
df <- data.frame(tours = c("ansc123123", "ansc123123", "ansc123123", "baa3999", "baa3999", "baa3999"),
order = rep(c(1, 2, 3), 2))
Now my real data is much larger with many more observations and unique identifiers, but I would like to have an output on the format as when you do something like this (but not manually encoded), so that the journeys with the same tours
value get encoded as the same journey.
df$journey <- c(1, 1, 1, 2, 2, 2)
You can convert it to a factor
.
df$journey <- as.integer(factor(df$tours))
df$journey
#[1] 1 1 1 2 2 2
Or use match
and unique
.
match(df$tours, unique(df$tours))
Its also possible to use factor
and get the integer values with unclass
. Here the levels
are saved, what allows to come back to the original values.
df$journey <- unclass(factor(df$tours))
df$journey
#[1] 1 1 1 2 2 2
#attr(,"levels")
#[1] "ansc123123" "baa3999"
levels(df$journey)[df$journey]
#[1] "ansc123123" "ansc123123" "ansc123123" "baa3999" "baa3999"
#[6] "baa3999"