Assume I have this data set. Please let me know if it is a duplicate but I am confused in this.
library(tidymodels)
mt <- mtcars[,c('mpg', 'hp', 'drat', 'am')]
mt$hp <- as.character(mt$hp)
mt$drat <- as.character(mt$drat)
dp_pipe1=recipe(mpg ~ hp + drat + am,data=mt) %>%
update_role(c(hp,
drat),new_role="to_numeric") %>%
step_mutate_at(has_role('to_numeric'), fn= as.numeric)
dp_pipe2=prep(dp_pipe1)
bake(dp_pipe2, NULL)
if you run the last step of bake, you will realise that the value of drat has been changed , in the actual data it was 3.9, 3.9, 3.85 etc but now it is coming like 16, 16, 15 etc. Note I am doing a forced character conversion on mtcars data just to show that I am doing a char to num conversion in the processing of data.
I am sorry if I am mistaken on doc. But unable to understand this. Please help
Note my data has no factors:
EDIT 2:
> glimpse(mt)
Rows: 32
Columns: 4
$ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3…
$ hp <chr> "110", "110", "93", "110", "175", "105",…
$ drat <chr> "3.9", "3.9", "3.85", "3.08", "3.15", "2…
$ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
if I run this:
dp_pipe1=recipe(mpg ~ hp + drat + am,data=mt) %>%
update_role(c(hp,
drat),new_role="to_numeric") %>%
step_mutate_at(has_role('to_numeric'), fn= function(x)as.numeric(as.character(x)))
dp_pipe2=prep(dp_pipe1)
bake(dp_pipe2, NULL)
The code gives right result.
EDIT 1:
I am not sure if it is bug or not, but if we choose
fn = function(x)as.numeric(as.character(x))
in the step_mutate_at, it works fine.
For 99% of modeling situations, factor encodings are better than character encodings for qualitative data. For that reason, recipes will convert characters to factors. There is a prep()
option (strings_as_factors
) to avoid this.
What you are getting for drat
is the integer that is the factor level index.
Here's an example:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
drat_0 <- mtcars$drat
drat_1 <- as.character(drat_0)
drat_2 <- factor(drat_1)
drat_3 <- as.numeric(drat_2)
tibble(drat_0, drat_1, drat_2, drat_3) %>% str()
#> tibble [32 × 4] (S3: tbl_df/tbl/data.frame)
#> $ drat_0: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#> $ drat_1: chr [1:32] "3.9" "3.9" "3.85" "3.08" ...
#> $ drat_2: Factor w/ 22 levels "2.76","2.93",..: 16 16 15 5 6 1 7 11 17 17 ...
#> $ drat_3: num [1:32] 16 16 15 5 6 1 7 11 17 17 ...
Created on 2023-07-18 with reprex v2.0.2