Search code examples
rbigdatareshapeff

reshaping a large data frame from wide to long in R


I've been through the various reshape questions but don't believe this iteration has been asked before. I am dealing with a data frame of 81K rows and 4188 variables. Variables 161:4188 are the measurements present as different variables. The idvar is in column 1. I want to repeat columns 1:160 and create new records for columns 169:4188. The final data frame will be of the dimension 162 columns and 326,268,000 rows (81K * 4028 variables converted as unique records).

Here is what I tried:

reshapeddf <- reshape(c, idvar = "PID", varying = c(dput(names(c[161:4188]))), v.names = "viewership", timevar = "network.show", times = c(dput(names(c[161:4188]))), direction = "long")

The operation didn't complete. I waited almost 10 minutes. Is this the right way? I am on a Windows 7, 8GB RAM, i5 3.20ghz PC. What is the most efficient way to complete this transpose in R? Both of the answers by BondedDust and Nick are clever but I run into memory issues. Is there a way any of the three approaches in this thread- reshape, tidyr or the do.call be implemented using ff?

In Example Data below, columns 1:4 are the ones I want to repeat and columns 5:9 are the ones to create new records for.

structure(list(PID = c(1003401L, 1004801L, 1007601L, 1008601L, 
1008602L, 1011901L), HHID = c(10034L, 10048L, 10076L, 10086L, 
10086L, 10119L), HH.START.DATE = structure(c(1378440000, 1362974400, 
1399521600, 1352869200, 1352869200, 1404964800), class = c("POSIXct", 
"POSIXt"), tzone = ""), VISITOR.CODE = structure(c(1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("0", "L"), class = "factor"), WEIGHTED.MINUTES.VIEWED..ABC...20.20.FRI = c(0, 
0, 305892, 0, 101453, 0), WEIGHTED.MINUTES.VIEWED..ABC...BLACK.ISH = c(0, 
0, 0, 0, 127281, 0), WEIGHTED.MINUTES.VIEWED..ABC...CASTLE = c(0, 
27805, 0, 0, 0, 0), WEIGHTED.MINUTES.VIEWED..ABC...CMA.AWARDS = c(0, 
679148, 0, 0, 278460, 498972), WEIGHTED.MINUTES.VIEWED..ABC...COUNTDOWN.TO.CMA.AWARDS = c(0, 
316448, 0, 0, 0, 0)), .Names = c("PID", "HHID", "HH.START.DATE", 
"VISITOR.CODE", "WEIGHTED.MINUTES.VIEWED..ABC...20.20.FRI", "WEIGHTED.MINUTES.VIEWED..ABC...BLACK.ISH", 
"WEIGHTED.MINUTES.VIEWED..ABC...CASTLE", "WEIGHTED.MINUTES.VIEWED..ABC...CMA.AWARDS", 
"WEIGHTED.MINUTES.VIEWED..ABC...COUNTDOWN.TO.CMA.AWARDS"), row.names = c(NA, 
6L), class = "data.frame")

Solution

  • Might be as easy as something like this:

       dat2 <- cbind(dat[1:4],   stack( dat[5:length(dat)] )