Tl;dr - I'm trying to use the merge.data.table() function with row indexes and the suggestions given in the R documentation are not working.
My data is roughly:
library(data.table)
library(quantreg)
library(purrr)
foo <- expand.grid(c(seq(60001, 60050, by = 1),
c("18-30", "31-60", "61+"),
c("pre", "during", "after"))
foo <- as.data.table(foo)
setnames(foo, names(foo), c("zip", "agegroup", "period"))
foo <- cbind(foo,
quartile = floor(runif(n = nrow(foo), 1, 4)),
times = runif(n = nrow(foo), 18, 25))
I ran several quantile regressions on the data, subsetting by age group (at someone else's request).
v_tau <- c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99)
mq_age1 <- map(v_tau, ~rq(data = foo[agegroup == "18-30",],
times ~ quartile + period + quartile*period,
tau = .x))
I'm trying to merge a vector of predicted fitted values from the rq()
object with the original data table (I could also transform it into a dataframe, it doesn't have to be a data table). This vector is shorter than the number of rows in the data table, so I've been trying to apply the answer given here for a plm()
object, modifying to account for the fact that my fitted values do not have multiple index attributes.
So, what I have been trying to do is join them by row index. I realize I can make another column with an explicit index, but I would like to avoid that because the fitted values are from a subset of the data and I am joining them to a subset of the data; adding an explicit index is possible, but not uniform or parsimonious, and will end up generating a lot of NAs that I don't want to deal with.
fitted <- mq_age1[[10]]$fitted.values
d_fitted <- cbind(attr(fitted, "index"),
fitted = fitted)
foo2 <- merge(foo[agegroup == "18-30",], d_fitted, by = 0, all.x = TRUE)
Looking at the merge()
documentation, it says: "Columns to merge on can be specified by name, number or by a logical vector: the name "row.names" or the number 0 specifies the row names. If specified by name it must correspond uniquely to a named column in the input."
However, when I try this, it gives me the following error message:
Error in
merge.data.table(foo[agegroup == "18-30", ], d_fitted, by = 0, :
A non-empty vector of column names for `by` is required.
Similarly, when I try using "row.names":
foo2 <- merge(foo[agegroup == "18-30",], d_fitted, by = "row.names", all.x = TRUE)
Error in merge.data.table(foo[agegroup == "18-30", ], d_fitted, by = "row.names", :
Elements listed in `by` must be valid column names in x and y
What is going on? Why can't I do this?
Found the answer: @r2evans kindly pointed out that base::merge has this functionality, while data.table::merge does not.
foo <- as.data.frame(foo)
before
foo2 <- merge(foo[foo$agegroup == "18-49",], d_fitted, by = 0, all.x = TRUE)
did the trick. Thanks!