Search code examples
rdataframedplyrreshape

Is there a way to use reshape in tbl?


In a previous question I experienced an issue by reshapeing an tbl object.

After hours I realized it was because I was using tbl and not data.frame. So to use reshape() and keep the tbl environment I have to do:

mydata %>% as.data.frame %>% reshape(, ...) %>% as.tbl

Is there another way?


Solution

  • More so that there's an answer than any other reason, here are four options to consider.

    First, if you want to use reshape along with "dplyr", you'll have to make use of the new.row.names argument in reshape, and set them to a sequence of the number of rows you expect your reshaped dataset to have. The calculation is easy. Take the length of one batch of columns that are going from the wide form to the long form and multiply it by the number of rows in the original dataset.

    This approach is sure to make Hadley squirm with displeasure, so use at your own risk.

    mydf <- tbl_df(mydf)
    class(mydf)
    # [1] "tbl_df"     "tbl"        "data.frame"
    
    mydf %>% 
      reshape(
        idvar="g_id",
        direction="long",
        varying=list(c(5:14),c(15:24)), 
        v.names=c("PLC","P"),
        new.row.names = seq_len(length(5:14) * nrow(mydf)))
    

    Another approach that might make Hadley squirm, but a little bit less, would be to use melt, but the melt from "data.table", not from "reshape2". This would, of course, require you to convert your tbl_df to a data.table, so similar to your current approach, it requires another step.

    library(data.table)
    mydf %>%
      data.table %>%
      melt(measure.vars = patterns("PLC[0-9]+", "P[0-9]+"), 
           value.name = c("PLC", "P"))
    

    Remaining within the Hadleyverse, you can try "tidyr" (as suggested by @DavidArenburg). It's not as pretty as he might have had in mind, because it requires first making a very long dataset and then re-widening it, unlike the melt approach above that does the not-so-long reshaping in one step.

    library(tidyr)
    mydf %>%
      gather(var, val, starts_with("P")) %>%
      mutate(var = gsub("([A-Z]+)", "\\1_", var)) %>%  ## you can probably be clever and...
      separate(var, into = c("variable", "time")) %>%  ## come up with some fancier regex
      spread(variable, val)
    

    Finally, there's also merged.stack from my "splitstackshape" package. With it, the approach would be like:

    library(splitstackshape)
    merged.stack(mydf, var.stubs = c("PLC", "P"), sep = "var.stubs")