Search code examples
rgroupingmultiple-columnssmoothingloess

Fit loess smoothers for multiple groups across multiple numeric variables


I need to fit many loess splines by the grouping variable (Animal) across multiple numeric columns (Var1, Var2), and extract these values.

I found code to do this task one variable at a time;

# Create dataframe 1
OneVarDF <- data.frame(Day = c(replicate(1,sample(1:50,200,rep=TRUE))),
                 Animal = c(c(replicate(100,"Greyhound"), c(replicate(100,"Horse")))),
                 Var1 = c(c(replicate(1,sample(2:10,100,rep=TRUE))), c(replicate(1,sample(15:20,100,rep=TRUE)))))


library(dplyr)
library(tidyr)
library(purrr)

# Get fitted values from each model
Models <- OneVarDF %>%
  tidyr::nest(-Animal) %>%
  dplyr::mutate(m = purrr::map(data, loess, formula = Var1 ~ Day, span = 0.30),
                fitted = purrr::map(m, `[[`, "fitted")
  )

# Create prediction column
Results <- Models %>%
  dplyr::select(-m) %>%
  tidyr::unnest()

This "Results" dataframe is essential for downstream tasks (detrending many non-parametric distributions).

How can we achieve this with a dataframe with multiple numeric columns (code below), and extract a "Results" dataframe? Thank you.

# Create dataframe 2
TwoVarDF <- data.frame(Day = c(replicate(1,sample(1:50,200,rep=TRUE))),
                       Animal = c(c(replicate(100,"Greyhound"), c(replicate(100,"Horse")))),
                       Var1 = c(c(replicate(1,sample(2:10,100,rep=TRUE))), c(replicate(1,sample(15:20,100,rep=TRUE)))),
                       Var2 = c(c(replicate(1,sample(22:27,100,rep=TRUE))), c(replicate(1,sample(29:35,100,rep=TRUE)))))

Solution

  • We can get the data in long format using. pivot_longer, group_by Animal and column name and apply loess to each combinaton.

    library(dplyr)
    library(tidyr)
    
    TwoVarDF %>%
      pivot_longer(cols = starts_with('Var')) %>%
      group_by(Animal, name) %>%
      mutate(model = loess(value~Day, span = 0.3)$fitted)