In a previous question I experienced an issue by reshape
ing an
tbl
object.
After hours I realized it was because I was using tbl
and not data.frame
. So to use reshape()
and keep the tbl
environment I have to do:
mydata %>% as.data.frame %>% reshape(, ...) %>% as.tbl
Is there another way?
More so that there's an answer than any other reason, here are four options to consider.
First, if you want to use reshape
along with "dplyr", you'll have to make use of the new.row.names
argument in reshape
, and set them to a sequence of the number of rows you expect your reshaped dataset to have. The calculation is easy. Take the length of one batch of columns that are going from the wide form to the long form and multiply it by the number of rows in the original dataset.
This approach is sure to make Hadley squirm with displeasure, so use at your own risk.
mydf <- tbl_df(mydf)
class(mydf)
# [1] "tbl_df" "tbl" "data.frame"
mydf %>%
reshape(
idvar="g_id",
direction="long",
varying=list(c(5:14),c(15:24)),
v.names=c("PLC","P"),
new.row.names = seq_len(length(5:14) * nrow(mydf)))
Another approach that might make Hadley squirm, but a little bit less, would be to use melt
, but the melt
from "data.table", not from "reshape2". This would, of course, require you to convert your tbl_df
to a data.table
, so similar to your current approach, it requires another step.
library(data.table)
mydf %>%
data.table %>%
melt(measure.vars = patterns("PLC[0-9]+", "P[0-9]+"),
value.name = c("PLC", "P"))
Remaining within the Hadleyverse, you can try "tidyr" (as suggested by @DavidArenburg). It's not as pretty as he might have had in mind, because it requires first making a very long dataset and then re-widening it, unlike the melt
approach above that does the not-so-long reshaping in one step.
library(tidyr)
mydf %>%
gather(var, val, starts_with("P")) %>%
mutate(var = gsub("([A-Z]+)", "\\1_", var)) %>% ## you can probably be clever and...
separate(var, into = c("variable", "time")) %>% ## come up with some fancier regex
spread(variable, val)
Finally, there's also merged.stack
from my "splitstackshape" package. With it, the approach would be like:
library(splitstackshape)
merged.stack(mydf, var.stubs = c("PLC", "P"), sep = "var.stubs")