Search code examples
rindexingmergepredictionquantile-regression

Merging on row index in R (by = 0 and by = "row.names" not working)


Tl;dr - I'm trying to use the merge.data.table() function with row indexes and the suggestions given in the R documentation are not working.

My data is roughly:

library(data.table)
library(quantreg)
library(purrr)

foo <- expand.grid(c(seq(60001, 60050, by = 1),
                   c("18-30", "31-60", "61+"),
                   c("pre", "during", "after"))
foo <- as.data.table(foo)
setnames(foo, names(foo), c("zip", "agegroup", "period"))
foo <- cbind(foo, 
             quartile = floor(runif(n = nrow(foo), 1, 4)),
             times = runif(n = nrow(foo), 18, 25))

I ran several quantile regressions on the data, subsetting by age group (at someone else's request).

v_tau <- c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99)

mq_age1 <- map(v_tau, ~rq(data = foo[agegroup == "18-30",], 
               times ~ quartile + period + quartile*period,
               tau = .x))  

I'm trying to merge a vector of predicted fitted values from the rq() object with the original data table (I could also transform it into a dataframe, it doesn't have to be a data table). This vector is shorter than the number of rows in the data table, so I've been trying to apply the answer given here for a plm() object, modifying to account for the fact that my fitted values do not have multiple index attributes.

So, what I have been trying to do is join them by row index. I realize I can make another column with an explicit index, but I would like to avoid that because the fitted values are from a subset of the data and I am joining them to a subset of the data; adding an explicit index is possible, but not uniform or parsimonious, and will end up generating a lot of NAs that I don't want to deal with.

fitted <- mq_age1[[10]]$fitted.values
d_fitted <- cbind(attr(fitted, "index"),
                    fitted = fitted)

foo2 <- merge(foo[agegroup == "18-30",], d_fitted, by = 0, all.x = TRUE) 

Looking at the merge() documentation, it says: "Columns to merge on can be specified by name, number or by a logical vector: the name "row.names" or the number 0 specifies the row names. If specified by name it must correspond uniquely to a named column in the input."

However, when I try this, it gives me the following error message:

Error in 
merge.data.table(foo[agegroup == "18-30", ], d_fitted, by = 0,  : 
A non-empty vector of column names for `by` is required.

Similarly, when I try using "row.names":

foo2 <- merge(foo[agegroup == "18-30",], d_fitted, by = "row.names", all.x = TRUE)
Error in merge.data.table(foo[agegroup == "18-30", ], d_fitted, by = "row.names",  : 
  Elements listed in `by` must be valid column names in x and y

What is going on? Why can't I do this?


Solution

  • Found the answer: @r2evans kindly pointed out that base::merge has this functionality, while data.table::merge does not.

    foo <- as.data.frame(foo)
    

    before

    foo2 <- merge(foo[foo$agegroup == "18-49",], d_fitted, by = 0, all.x = TRUE)
    

    did the trick. Thanks!