Search code examples
rspreaddcast

Spread data based on multiple key variables


My data:

df <- as.data.frame(cbind(Bilagstoptekst = c("A", "A", "A", "B", "B", "C", "D", "E", "E", "F", "F", "F", "F", "F"), 
              AKT=c("80", "80", "80", "80", "80", "25", "80", "80", "80", "80", "80", "25", "25", "80"), 
              IArt=c("HUVE", "HUVE", "HUVE", "HUVE", "HUBO", "BILÅ", "HUBO", "HUVE", "HUVE", "HUBO", "HUVE", "BILÅ", "BILÅ", "HUBO" ),
              Belob=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14)))

> df
Bilagstoptekst AKT IArt Belob
A               80 HUVE     1
A               80 HUVE     2
A               80 HUVE     3
B               80 HUVE     4
B               80 HUBO     5
C               25 BILÅ     6
D               80 HUBO     7
E               25 HUVE     8
E               80 HUVE     9
F               80 HUBO    10
F               80 HUVE    11
F               25 BILÅ    12
F               25 BILÅ    13
F               80 HUBO    14

Now, I like to spread my Belob-column for each key of the combination of Bilagstoptekst, AKT and IArt.

Output data should be like this:

Bilagstoptekst AKT IArt Belob1 Belob2 Belob3 
A               80 HUVE     1     2      3
B               80 HUVE     4    NA     NA
B               80 HUBO     5    NA     NA
C               25 BILÅ     6    NA     NA
D               80 HUBO     7    NA     NA
E               80 HUVE     8     9     NA
F               80 HUBO    10    14     NA
F               80 HUVE    11    NA     NA
F               25 BILÅ    12    13     NA

Now, I've tried with spread and dcast, but I just can't make it work.

In my real dataset I have thousands of rows, so this is just sample data.


Solution

  • Here is a way using dcast from data.table

    library(data.table)
    dt <- as.data.table(df)
    dt[, idx := rowid(Bilagstoptekst, AKT, IArt)] # creates the timevar
    out <- dcast(dt, 
                 Bilagstoptekst + AKT + IArt ~ paste0("Belob", idx),
                 value.var = "Belob")
    out
    #   Bilagstoptekst AKT IArt Belob1 Belob2 Belob3
    #1:              A  80 HUVE      1      2      3
    #2:              B  80 HUBO      5   <NA>   <NA>
    #3:              B  80 HUVE      4   <NA>   <NA>
    #4:              C  25 BILÅ      6   <NA>   <NA>
    #5:              D  80 HUBO      7   <NA>   <NA>
    #6:              E  80 HUVE      8      9   <NA>
    #7:              F  25 BILÅ     12     13   <NA>
    #8:              F  80 HUBO     10     14   <NA>
    #9:              F  80 HUVE     11   <NA>   <NA>
    

    What is important here is the column idx that we created which serves as a "timevar" when we reshape your data.


    In base R you would need to do

    df$idx <- with(df, ave(Belob, Bilagstoptekst, AKT, IArt, FUN = seq_along))
    reshape(df, idvar = c("Bilagstoptekst", "AKT", "IArt"), timevar = "idx", direction = "wide")
    

    The tidyverse approach is left as an exercise ;)


    Not sure if your question is a duplicate of Transpose / reshape dataframe without “timevar” from long to wide format.