Search code examples
rfactors

Use a second dataset as factor levels for the corresponding numerical values in the first dataset


I have two datasets. They refer to the same data. However, one has string as answers to questions, and the other has the corresponding codes.

library(data.table)
dat_string <- fread("str_col1 str_col2 numerical_col
                     One   Alot          1
                     Two   Alittle       0")     

dat_codes <- fread("code_col1 code_col2 numerical_col
                     0     3    1
                     1     5    0")

I would like, to combine both datasets, so that the levels get attached to the corresponding codes as labels, (see this example) for all string columns (in dat_string).

Please note that the column names can have any format and do not necessarily have the format from the example/

What would be the easiest way to do this?

Desired outcome:

dat_codes$code_col1 <- factor(dat_codes$code_col1, levels=c("0", "1"),
labels=c("One", "Two"))    

attributes(dat_codes$code_col1)$levels
[1] "One" "Two"

Solution

  • If I understand your edit - you are saying that both tables are the same shape, with the same row order, it is just that one has labels and one has levels. If that is the case it should be even more straightforward than my original response:

    code_cols  <- which(sapply(dat_string, is.character))
    
    for(j in code_cols) {
        set(dat_codes, j = j, value = factor(
                    dat_codes[[j]], 
                    levels = unique(dat_codes[[j]]),
                    labels = unique(dat_string[[j]])
            )
        )
    }
    
    
    dat_codes
    #    code_col1 code_col2 numerical_col
    # 1:       One      Alot             1
    # 2:       Two   Alittle             0
    
    dat_codes$code_col1
    # [1] One Two
    # Levels: One Two
    
    sapply(dat_codes, class)
    # code_col1     code_col2 numerical_col
    #  "factor"      "factor"     "integer"