I have two data frames: main_df
is the master table. addl_df
is a smaller table.
GOAL: to convert all character variables in addl_df
as factors with the same levels as the character variables with the same names in main_df
.
main_df <- data.frame(id=c(1, 2, 3, 4, 5), age=c(10, 20, 30, 40, 45), gender=c("F","F","M","M","F"), city=c("A","B","C","D","D"))
addl_df <- data.frame(id=c(7,8), age=c( 40, 45), gender=c("F","F"), city=c("C","D"))
Using the code below, city
will be a factor variable with 2 levels ("C" and "D). What I want is a factor w/ 4 levels "A","B","C","D" with "C" has value of 3 (same as how it's defined in main_df
).
Is it possible to do this in an automated way (instead of manually defining the variable one-by-one? Thank you!
main_df[sapply(main_df, is.character)] <- lapply(main_df[sapply(main_df, is.character)], as.factor)
addl_df[sapply(addl_df, is.character)] <- lapply(addl_df[sapply(addl_df, is.character)], as.factor)
One option is to bind the datasets with bind_rows
, while creating a data identifier ('grp'), convert the character
columns to factor
, do a group_split
by the 'grp' into a list
of data.frames, then set the names of the list
with setNames
and update the original objects with list2env
library(dplyr)
bind_rows(main_df, addl_df, .id = 'grp') %>%
mutate(across(where(is.character), factor)) %>%
group_split(grp, .keep = FALSE) %>%
setNames(c('main_df', 'addl_df')) %>%
list2env(.GlobalEnv)
-output
> str(main_df)
tibble [5 × 4] (S3: tbl_df/tbl/data.frame)
$ id : num [1:5] 1 2 3 4 5
$ age : num [1:5] 10 20 30 40 45
$ gender: Factor w/ 2 levels "F","M": 1 1 2 2 1
$ city : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 4
> str(addl_df)
tibble [2 × 4] (S3: tbl_df/tbl/data.frame)
$ id : num [1:2] 7 8
$ age : num [1:2] 40 45
$ gender: Factor w/ 2 levels "F","M": 1 1
$ city : Factor w/ 4 levels "A","B","C","D": 3 4