Is it possible to change an existing column in a DataFrame from a factor to an ordered factor? And I mean without providing the factor and labels hard-coded.
Some background info, we recieve a dataframe provided through an API and some of the coded values are factors. This is great and very useful, it works for most cases, but now we want to aggregate on some of these multiple choice columns.
For example we want to get the maximum of a visitcode, in order get the most recent visit per patient. Or get the minimum of a 0=No, 1=Yes
column to see which patients have at least one No
. There are many different coded values and we don't want to hardcode all the code/labels for every column.
Also, the provided factor values already have correct codes that theoretically could be ordered and sorted. For example visit codes are 1=Baseline, 2=At3m, 3=At6m, 4=At12m, 5=At18m
etc. And in fact, when I inspect the dataframe in the Rstudio IDE, I can already sort on that column by clicking the column header. It is sorted correctly meaning according to the underlying stored values and not alphabetically on the display labels.
So in the RStudio IDE the column is displayed as this:
Factor w/ 3 level "Baseline", "At3m",..: 1 2 4 1 2 3 NA 1 2
And I just want to change it to
Ord.factor w/ 3 level "Baseline", "At3m",..: 1 2 4 1 2 3 NA 1 2
See example code below
# test dataframe, the "real" dataframe is provided through an API
df_test <- data.frame(
record_id = c(1001, 1001, 1001, 1002, 1002, 1003, 1004, 1005, 1005),
measurement = factor(c("Baseline", "At3m", "At12m", "Baseline", "At3m", "At6m", NA, "Baseline", "At3m"), levels=c("Baseline", "At3m", "At6m", "At12m"))
)
df_max <- aggregate(df_test$measurement, by = list(df_test$record_id), max) # ‘max’ not meaningful for factors
# error ‘max’ not meaningful for factors
df_max <- aggregate(as.numeric(as.character(df_test$measurement)), by = list(df_test$record_id), max)
# gives all NA(?) plus the column names are changed
df_max <- ??
I know I can just change the code above and add ordered=TRUE like: measurement = factor(c("Baseline ... , ordered=TRUE)
but this is just example code, in practise we get a large dataframe with lots of columns which is provided by an API.
I've tried searching for code examples but could only find trivial examples involving factors, but maybe I'm searching wrong.
Ok turns out I wasn't using the right search keywords or something, it's pretty straightforward:
# change factor to ordered factor
df_test$measurement <- as.ordered(df_test$measurement)
And I'll just add that you can also do it by dynamically selecting a column in the dataframe, by referring to the column using a string value (which would have been my follow up question)
# change factor to ordered factor
# dynamically set the column name in a string variable
colname <- "measurement"
df_test[colname] <- as.ordered(df_test[[colname]]) # double brackets [[ ]]