Search code examples
rdplyrtidyrampl

Reorder dataframe programmatically instead of subseting it


I am trying to transform a dataframe to a specific format in order to export it from R and use it in AMPL.

My initial dataframe is the following

test <- structure(list(from = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 
to = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3), beta = c(0.0214674078064637, 
0.0205966237172006, 0.0197611613089226, 0.0214674078064637, 
0.0205966237172006, 0.0197611613089226, 0.0214674078064637, 
0.0205966237172006, 0.0197611613089226, 0.0214674078064637, 
0.0205966237172006, 0.0197611613089226), Time = c(0L, 0L, 
0L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L)), class = "data.frame", row.names = c(NA, 
-12L), .Names = c("from", "to", "beta", "Time"))

This is the test data.frame

   from to       beta Time
1     1  1 0.02146741    0
2     1  2 0.02059662    0
3     1  3 0.01976116    0
4     1  1 0.02146741    1
5     1  2 0.02059662    1
6     1  3 0.01976116    1
7     1  1 0.02146741    2
8     1  2 0.02059662    2
9     1  3 0.01976116    2
10    1  1 0.02146741    3
11    1  2 0.02059662    3
12    1  3 0.01976116    3

my desired final result is this one:

        V T1          0 T2          1 T3          2 T4          3 line
1 [1,1,*]  0 0.02146741  1 0.02146741  2 0.02146741  3 0.02146741   \n
2 [1,2,*]  0 0.02059662  1 0.02059662  2 0.02059662  3 0.02059662   \n
3 [1,3,*]  0 0.01976116  1 0.01976116  2 0.01976116  3 0.01976116   \n

The way I do this for now is this:

Betas <- unite_(test, col = "V", sep = ",", from = c("from", "to"))
Betas <- spread(Betas, key = Time, value = beta)
Betas$V <- paste("[", Betas$V, ",*]", sep = "")
Betas$T1 <- 0
Betas$T2 <- 1
Betas$T3 <- 2
Betas$T4 <- 3
Betas <- Betas[,c(1,6,2,7,3,8,4,9,5)]
Betas$line <- "\n"

I like the process of my first 3 lines of code

Betas <- unite_(test, col = "V", sep = ",", from = c("from", "to"))
Betas <- spread(Betas, key = Time, value = beta)
Betas$V <- paste("[", Betas$V, ",*]", sep = "")

But I would like to replace the lines below, the reason for this, is that the time column could be repeated X times. Now times will always go from 0 to X, how could I do this knowing the value of X programmatically?

Betas$T1 <- 0
Betas$T2 <- 1
Betas$T3 <- 2
Betas$T4 <- 3
Betas <- Betas[,c(1,6,2,7,3,8,4,9,5)]
Betas$line <- "\n"

Solution

  • Base R solution

    temp = split(test, test$Time)
    do.call(cbind, lapply(1:length(temp), function(i){
        if (i == 1){
            setNames(data.frame(paste(temp[[i]][["from"]], ",", temp[[i]][["to"]], ",*", sep = ""), temp[[i]]["Time"], temp[[i]]["beta"]), 
                     c("V", paste("T", i, sep = ""), i-1))
        } else if (i == length(temp)){
            setNames(data.frame(temp[[i]]["Time"], temp[[i]]["beta"], rep("\n", NROW(temp[[i]]))), 
                     c(paste("T", i, sep = ""), i-1, "line"))
        } else {
            setNames(data.frame(temp[[i]]["Time"], temp[[i]]["beta"]),
                     c(paste("T", i, sep = ""), i-1))
        }
    }))
    #      V T1          0 T2          1 T3          2 T4          3 line
    #1 1,1,*  0 0.02146741  1 0.02146741  2 0.02146741  3 0.02146741   \n
    #2 1,2,*  0 0.02059662  1 0.02059662  2 0.02059662  3 0.02059662   \n
    #3 1,3,*  0 0.01976116  1 0.01976116  2 0.01976116  3 0.01976116   \n