I am trying to restructure an enormous dataframe (about 12.000 cases): In the old dataframe one person is one row and has about 250 columns (e.g. Person 1, test A1, testA2, testB, ...)and I want all the results of test A (1 - 10 A´s overall and 24 items (A-Y) for that person in one column, so one person end up with 24 columns and 10 rows. There is also a fixed dataframe part before the items A-Y start (personal information like age, gender etc.), which I want to keep as it is (fixdata). The function/loop works for 30 cases (I tried it in advance) but for the 12.000 it is still calculating, for nearly 24hours now. Any ideas why?
restructure <- function(data, firstcol, numcol, numsets){
out <- data.frame(t(rep(0, (firstcol-1)+ numcol)) )
names(out) <- names(daten[0:(firstcol+numcol-1)])
for(i in 1:nrow(daten)){
fixdata <- (daten[i, 1:(firstcol-1)])
for (j in (seq(firstcol, ((firstcol-1)+ numcol* numsets), by = numcol))){
flexdata <- daten[i, j:(j+numcol-1)]
tmp <- cbind(fixdata, flexdata)
names(tmp) <- names(daten[0:(firstcol+numcol-1)])
out <- rbind(out,tmp)
}
}
out <- out[2:nrow(out),]
return(out)
}
Thanks in advance!
Idea why: you rbind
to out
in each iteration. This will take longer each iteration as out grows - so you have to expect more than linear growth in run time with increasing data sets.
So, as Andrie tells you can look at melt
.
Or you can do it with core R: stack
.
Then you need to cbind the fixed part yourself to the result, (you need to repeat the fixed columns with each = n.var.cols
A third alternative would be array2df
from package arrayhelpers.