Search code examples
rindexingsplinemap-function

How to index the features() function to iterate over a list of data frames using map() function in R?


Plotting my soil compaction data gives a convex-up curve. I need to determine the maximum y-value and the x-value which produces that maximum.

The 'features' package fits a smooth spline to the data and returns the features of the spline, including the y-maximum and critical x-value. I am having difficulty iterating the features() function over multiple samples, which are contained in a tidy list.

It seems that the features package is having trouble indexing to the data. The code works fine when I use data for only one sample, but when I try to use the dot placeholder and square brackets it loses track of the data.

Below is the code showing how this process works correctly for one sample, but not for an iteration.

#load packages
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 3.6.3
#> Warning: package 'forcats' was built under R version 3.6.3
library(features)
#> Warning: package 'features' was built under R version 3.6.3
#> Loading required package: lokern
#> Warning: package 'lokern' was built under R version 3.6.3

# generate example data 
df <- tibble(
  sample = (rep(LETTERS[1:3], each=4)),
  w =      c(seq(0.08, 0.12, by=0.0125), 
             seq(0.09, 0.13, by=0.0125), 
             seq(0.10, 0.14, by=0.0125)),
  d=      c(1.86, 1.88, 1.88, 1.87, 
            1.90, 1.92, 1.92, 1.91, 
            1.96, 1.98, 1.98, 1.97) )
df
#> # A tibble: 12 x 3
#>    sample      w     d
#>    <chr>   <dbl> <dbl>
#>  1 A      0.08    1.86
#>  2 A      0.0925  1.88
#>  3 A      0.105   1.88
#>  4 A      0.118   1.87
#>  5 B      0.09    1.9 
#>  6 B      0.102   1.92
#>  7 B      0.115   1.92
#>  8 B      0.128   1.91
#>  9 C      0.1     1.96
#> 10 C      0.112   1.98
#> 11 C      0.125   1.98
#> 12 C      0.138   1.97

# use the 'features' package to fit a smooth spline and extract the spline features, 
# including local y-maximum and critical point along x-axis.
# This works fine for one sample at a time:

sample1_data <- df %>% filter(sample == 'A')
sample1_features <- features(x= sample1_data$w, 
                             y= sample1_data$d, 
                             smoother = "smooth.spline")
sample1_features
#> $f
#>         fmean          fmin          fmax           fsd         noise 
#>  1.880000e+00  1.860000e+00  1.880000e+00  1.000000e-02  0.000000e+00 
#>           snr         d1min         d1max       fwiggle         ncpts 
#>  2.707108e+11 -9.100000e-01  1.970000e+00  9.349000e+01  1.000000e+00 
#> 
#> $cpts
#> [1] 0.1
#> 
#> $curvature
#> [1] -121.03
#> 
#> $outliers
#> [1] NA
#> 
#> attr(,"fits")
#> attr(,"fits")$x
#> [1] 0.0800 0.0925 0.1050 0.1175
#> 
#> attr(,"fits")$y
#> [1] 1.86 1.88 1.88 1.87
#> 
#> attr(,"fits")$fn
#> [1] 1.86 1.88 1.88 1.87
#> 
#> attr(,"fits")$d1
#> [1]  1.9732965  0.8533784 -0.5868100 -0.9061384
#> 
#> attr(,"fits")$d2
#> [1]  4.588832e-03 -1.791915e+02 -5.123866e+01  1.461069e-01
#> 
#> attr(,"class")
#> [1] "features"

# But when attempting to use the pipe and the map() function 
# to iterate over a list containing data for multiple samples, 
# using the typical map() placeholder dot will not index to the 
# list element/columns that are being passed to .f

df_split <- split(df, f= df[['sample']])
df_split
#> $A
#> # A tibble: 4 x 3
#>   sample      w     d
#>   <chr>   <dbl> <dbl>
#> 1 A      0.08    1.86
#> 2 A      0.0925  1.88
#> 3 A      0.105   1.88
#> 4 A      0.118   1.87
#> 
#> $B
#> # A tibble: 4 x 3
#>   sample     w     d
#>   <chr>  <dbl> <dbl>
#> 1 B      0.09   1.9 
#> 2 B      0.102  1.92
#> 3 B      0.115  1.92
#> 4 B      0.128  1.91
#> 
#> $C
#> # A tibble: 4 x 3
#>   sample     w     d
#>   <chr>  <dbl> <dbl>
#> 1 C      0.1    1.96
#> 2 C      0.112  1.98
#> 3 C      0.125  1.98
#> 4 C      0.138  1.97

df_split %>% map(.f = features, x = .[['w']], y= .[['d']], smoother = "smooth.spline")
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#> Error in seq.default(min(x), max(x), length = max(npts, length(x))): 'from' must be a finite number

Created on 2020-04-04 by the reprex package (v0.3.0)


Solution

  • You could use group_split to split the data based on sample and use map to apply features functions to each subset of data.

    library(features)
    library(dplyr)
    library(purrr)
    
    list_model <- df %>% 
                   group_split(sample) %>% 
                   map(~features(x = .x$w, y = .x$d, smoother = "smooth.spline"))