Search code examples
rr-factor

Changing order of factor levels based on lookup in other column


I have a data frame containing – amongst others – two columns referring to the same thing. One is a numeric ID for the other, which is parsed as a factor.

df = data.frame(
  "id" =  c(5, 3, 1, 2, 4, 5),
  "val" = factor(c("a", "b", "c", "d", "e", "a")),
  "someColumn" = c(13, 38, 91, 83, 19)
)

There are duplicates in those factor levels since there are other additional columns. Now, the factor levels are ordered alphabetically, no matter in which order they appear in the dataframe.

Here's the problem: I want to order the levels of the factor depending on their ID. This way, it gets easier to work with it, especially in plots. I do not want to change the labels. I would be fine with changing the levels to the actual ID, but I don't think it's necessary.

In other examples I found, the suggestion was to do something like this:

factor(df$val, levels = df$val[order(df$id)])

However, this does not work in my case, because there are duplicates:

Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated

I don't want to remove the rows in my original data, since I don't want to throw away the data or change its order, and continue working with the dataframe. Can I get rid of the warning and the duplicated levels some other way? Or should I use another approach entirely?


Solution

  • Try this:

    factor(df$val, levels = unique(df$val[order(df$id)]))