Search code examples
rerror-handlingdplyrregressionquantreg

How to add a column of fitted values to a data frame by group?


Say I have a data frame like this:

X <- data_frame(
  x = rep(seq(from = 1, to = 10, by = 1), 3),
  y = 2*x + rnorm(length(x), sd = 0.5),
  g = rep(LETTERS[1:3], each = length(x)/3))

How can I fit a regression y~x grouped by variable g and add the values from the fitted and resid generic methods to the data frame?

I know I can do:

A <- X[X$g == "A",]
mA <- with(A, lm(y ~ x))
A$fit <- fitted(mA)
A$res <- resid(mA)

B <- X[X$g == "B",]
mB <- with(B, lm(y ~ x))
B$fit <- fitted(mB)
B$res <- resid(mB)

C <- X[X$g == "C",]
mC <- with(B, lm(y ~ x))
C$fit <- fitted(mC)
C$res <- resid(mC)

And then rbind(A, B, C). However, in real life I am not using lm (I'm using rqss in the quantreg package). The method occasionally fails, so I need error handling, where I'd like to place NA all the rows that failed. Also, there are way more than 3 groups, so I don't want to just keep copying and pasting code for each group.

I tried using dplyr with do but didn't make any progress. I was thinking it might be something like:

make_qfits <- function(data) {
  data %>%
    group_by(g) %>%
    do(failwith(NULL, rqss), formula = y ~ qss(x, lambda = 3))
}

Would this be easy to do by that approach? Is there another way in base R?


Solution

  • For the lm models you could try

    library(nlme)     # lmList to do lm by group
    library(ggplot2)  # fortify to get out the fitted/resid data
    do.call(rbind, lapply(lmList(y ~ x | g, data=X), fortify))
    

    This gives you the residual and fitted data in ".resid" and ".fitted" columns as well as a bunch of other fit data. By default the rownames will be prefixed with the letters from g.

    With the rqss models that might fail

    do.call(rbind, lapply(split(X, X$g), function(z) {
        fit <- tryCatch({
            rqss(y ~ x, data=z)
        }, error=function(e) NULL)
        if (is.null(fit)) data.frame(resid=numeric(0), fitted=numeric(0))
        else data.frame(resid=fit$resid, fitted=fitted(fit))
    }))