I have a data frame with these values dummy vales and I want to do lm regression on them. One of the variables is a grouped continuous variable as shown below
df <- data.frame("y" = c(10, 11, 12, 13, 14),
"x" = as.factor(c("100-102", "103-105", "106-108", "109-111", "112-114")))
I want to regress y~x, One way is to replace the x factors with their mean numeric values. This is easily done using regular expression.
Another way is to create the additional rows and expand your dataset so it looks like this
data.frame("y" = c(10, 10, 10, 11, 11, 11......),
"x" = c(100, 101, 102, 103, 104, 105......))
Is there a function that will do this?
I'm thinking of first creating additional variables like x1, x2, x3 and then use reshape2 package to convert the x columns to rows.
A data.table
solution. This should be really fast on large data.frame
's as well.
require(data.table)
dt <- data.table(df, key="y")
dt[, list(x=seq(sub("-.*$", "", x), sub(".*-", "", x))),by=y]
If you have more columns and you don't want each combinations while splitting by column x
, then this is the code to use:
require(data.table)
dt <- data.table(df)
# get all column names except "x"
key.cols <- setdiff(names(df), "x")
# set the data.table columns to key.cols
setkeyv(dt, key.cols)
dt.out <- dt[, list(x=seq(sub("-.*$", "", x), sub(".*-", "", x))), by = key.cols]
This should give you what you expect.