Search code examples
rr-caret

Make only some features dummyVars


my_diamonds <- diamonds %>% mutate(cut = as.character(cut),
                                   color = as.character(color),
                                   clarity = as.character(clarity))

I would like to create a new dataframe with just cut and color as dummyVars.

However, I cannot get the first block in the below code to work:

# make cut and color dummar vars
dummy <- caret::dummyVars("cut + color",
                            data = my_diamonds, fullRank = F, sep = ".")

# now create the dummy vars as new dataframe training data
training_data <- predict(dummy, my_diamonds) %>% as.data.frame()

This piece:

# make cut and color dummar vars
dummy <- caret::dummyVars("cut + color",
                            data = my_diamonds, fullRank = F, sep = ".")

Gives: Error in eval(parse(text = x, keep.source = FALSE)[[1L]]) : object 'color' not found.

Also tried:

dummy <- caret::dummyVars(~ "cut + color",
                            data = my_diamonds, fullRank = F, sep = ".")

Which gives: Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars

How can I create a new dataframe based on my_diamonds where cut and color are dummy vars?


Solution

  • One small issue: ~ "cut + color" should instead be "~ cut + color" or just ~ cut + color:

    dummy <- caret::dummyVars(~ cut + color,
                              data = my_diamonds, fullRank = FALSE, sep = ".")
    training_data <- predict(dummy, my_diamonds) %>% as.data.frame()
    head(training_data)
    #   cutFair cutGood cutIdeal cutPremium cutVery Good colorD colorE colorF colorG colorH colorI colorJ
    # 1       0       0        1          0            0      0      1      0      0      0      0      0
    # 2       0       0        0          1            0      0      1      0      0      0      0      0
    # 3       0       1        0          0            0      0      1      0      0      0      0      0
    # 4       0       0        0          1            0      0      0      0      0      0      1      0
    # 5       0       1        0          0            0      0      0      0      0      0      0      1
    # 6       0       0        0          0            1      0      0      0      0      0      0      1