Someone had this same question but they were using the splines
library whereas I am using tidymodels
.
I want to fit a cubic spline and split the domain of the independent variable into, say, 6 bins (i.e. make 5 cuts in its domain).
I believe this is done with step_bs()
(or step_ns()
in the case of natural splines).
I was unable to find which parameter sets the number of knots in the documentation. Moreover, it seems that splines::ns()
can be passed to the options
parameter, but the Readme is not available.
You may find this answer helpful in understanding the relationship between knots and degrees of freedom. You can set both deg_free
and degree
(the polynomial degree) in step_bs()
:
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
data(biomass, package = "modeldata")
biomass_tr <- biomass[biomass$dataset == "Training",]
biomass_te <- biomass[biomass$dataset == "Testing",]
rec <- recipe(HHV ~ carbon + hydrogen + oxygen,
data = biomass_tr) %>%
step_bs(carbon, deg_free = 7, degree = 4)
## training data
prep(rec) %>% bake(new_data = biomass_tr)
#> # A tibble: 456 × 10
#> hydrogen oxygen HHV carbon_bs_1 carbon_bs_2 carbon_bs_3 carbon_bs_4
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5.64 42.9 20.0 0 0 0.489 0.494
#> 2 5.7 41.3 19.2 0 0.000000158 0.502 0.484
#> 3 5.8 46.2 18.3 0 0.000812 0.575 0.421
#> 4 4.97 35.6 18.2 0.000196 0.0256 0.669 0.305
#> 5 5.4 40.7 18.4 0.000000163 0.00476 0.619 0.375
#> 6 5.75 40.2 18.5 0.000102 0.0202 0.663 0.317
#> 7 5.99 38.2 18.7 0 0.00263 0.603 0.393
#> 8 5.7 39.7 18.3 0.0000470 0.0156 0.655 0.329
#> 9 5.5 40.9 18.6 0 0.0000451 0.532 0.460
#> 10 5.9 40 18.9 0 0.00293 0.606 0.390
#> # … with 446 more rows, and 3 more variables: carbon_bs_5 <dbl>,
#> # carbon_bs_6 <dbl>, carbon_bs_7 <dbl>
## testing data
prep(rec) %>% bake(new_data = biomass_te)
#> # A tibble: 80 × 10
#> hydrogen oxygen HHV carbon_bs_1 carbon_bs_2 carbon_bs_3 carbon_bs_4
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5.67 47.2 18.3 0.00000387 0.00795 0.635 0.357
#> 2 5.5 48.1 17.6 0.00261 0.0730 0.687 0.237
#> 3 5.5 49.1 17.2 0.00431 0.0907 0.685 0.220
#> 4 6.1 37.3 18.9 0.00000294 0.00750 0.633 0.359
#> 5 6.32 42.8 20.5 0 0.0000535 0.534 0.458
#> 6 5.5 41.7 18.5 0.000751 0.0434 0.682 0.274
#> 7 5.23 54.1 15.1 0.0358 0.229 0.610 0.124
#> 8 4.66 33.8 16.2 0.00687 0.111 0.680 0.201
#> 9 4.4 31.1 11.1 0.294 0.396 0.224 0.0160
#> 10 3.77 23.7 10.8 0.339 0.376 0.175 0.0107
#> # … with 70 more rows, and 3 more variables: carbon_bs_5 <dbl>,
#> # carbon_bs_6 <dbl>, carbon_bs_7 <dbl>
Created on 2022-04-18 by the reprex package (v2.0.1)