Search code examples
rdatabasedatabase-designreshapewide-column-store

Transforming Wide for Long Database, Grouping Variables in R


I would like to make a modification to my database. What I am trying to do is transform every row of my database being for a single individual.

I want it to look like this. Example:

ID    massaseca Fator   Anexo     Teor  Tempo
1       334,68     AM       c      0       0
1       344,19     AM       c      0      10
1       347,32     AM       c      0      20 
1       350,2      AM       c      0      30 
1       352,52     AM       c      0      40
.        .          .       .      .       .
.        .          .       .      .       . 

Although I found some examples in the forum, I couldn't solve my problem. I will present both attempts. When I execute the code below, I find that it overrides some variables and is not the way you want.

####################################
require(reshape2)
long <- melt(dados1, id.vars = c("Teor", "Fator"))
melt(dados1, id.vars = 1:2)
melt(dados1, measure.vars = 4:45)
melt(dados1, measure.vars = as.character(10,20,30,
                                         40,50,60,70,80,
                                         90,105,120,135,150,
                                         170,190,210,230,250,
                                         270,290,310,330,350,
                                         360,405,450,510,570,
                                         630,690,750,810,870,
                                         930,990,1050,1110,1170,
                                         1230,1290,1350))

################# CASE 2 ###########
####################################
library(data.table)
long1 <- melt(setDT(dados1), id.vars = c("Teor", "Fator"), variable.name = "Tempo")
long1
melt(setDT(dados1), id.vars = 1:2, variable.name = "Tempo")
melt(setDT(dados1), measure.vars = 4:45, variable.name = "Tempo")
melt(setDT(dados1), measure.vars = as.character(10,20,30,
                                                  40,50,60,70,80,
                                                  90,105,120,135,150,
                                                  170,190,210,230,250,
                                                  270,290,310,330,350,
                                                  360,405,450,510,570,
                                                  630,690,750,810,870,
                                                  930,990,1050,1110,1170,
                                                  1230,1290,1350), variable.name = "Tempo")


However, with the example above it is filling one variable below another, how to do it?


Solution

  • There seems to be an unambiguous "id" column missing, which you may substitute by cbinding the row numbers. Then you could use base reshape where we grap the values for the times argument from the column names. (Note that you need to add 1 to the varyings if you set id as first column as I did.)

    res <- reshape(cbind(id=1:nrow(dados1), dados1), 
                   varying=5:46, 
                   v.names="massaseca",
                   timevar="Tempo", 
                   times=as.numeric(gsub("X", "", tail(names(dados1), -3))), 
                   direction="long", sep="")
    

    Result

    head(res[order(res$id), ])
    #      id Teor Fator Anexo Tempo massaseca
    # 1.0   1    0    Am     c     0    334,68
    # 1.10  1    0    Am     c    10    344,19
    # 1.20  1    0    Am     c    20    347,32
    # 1.30  1    0    Am     c    30     350,2
    # 1.40  1    0    Am     c    40    352,52
    # 1.50  1    0    Am     c    50    354,81