Search code examples
rtidyversefactors

How can I rename factors based on the column names of another data frame?


I have a column in a dataframe holding subjects:

sub <- c("A", "A", "B", "C", "C", "C", "D", "E", "F", "F")
subjects <- data.frame(sub)

I have another data frame containing columns of subjects (where subjects are only found in one column):

one <- c("A", "C", "F")
two <- c("B", "D", NA)
three <- c("E", NA, NA)
newsubjects <- data.frame(one, two, three)

I'm wanting to rename the subjects in the first dataframe to the column name found in the second dataframe corresponding to that subject.

So for example, I want the A, C, and F subjects in the first dataframe to be renamed 'one'. Doing this manually would take a long time so I'm hoping theres a way to use the columns in the second data frame to do this.

I've tried a bunch of stuff with forcats::fct_recode and levels but nothing works because I'm not using these functions correctly. Eg IIRC one of my attempts looked something like this:

subjects %>%
      mutate(new_var = forcats::fct_recode(sub,
            !!! setNames(as.character(subjects$sub), newsubjects$one)))

Which I know is completely wrong. Part of the problem is it's difficult fo me to articulate my problem in a way that returns relevant search results. Thank you for any help you can provide, I appreciate it.


Solution

  • Using purrr::map(), derive a list pairing column names with values from newsubjects. Then unpack this inside forcats::fct_collapse() to recode values in subjects.

    library(purrr)
    library(forcats)
    
    new_ids <- map(newsubjects, ~ .x[!is.na(.x)])
    
    subjects$sub <- fct_collapse(subjects$sub, !!!new_ids)
    
    subjects
    
         sub
    1    one
    2    one
    3    two
    4    one
    5    one
    6    one
    7    two
    8  three
    9    one
    10   one