Search code examples
rreshape2

How to turn data from long to wide format so that duplicate rows get added to the end to make new columns in R?


I got the following data.

 
ID  NAME   STATUS  OKR_T  OKR_N  NR
1   Jack     1      34     OK1    0
1   Jack     1     433     OK2    0
1   Jack     1      12     OK3    1
2   Bill     2      34     OK1    1
3   Steve    1     433     OK2    1
3   Steve    1      34     OK1    0
3   Steve    1      45     OK4    0

I need to get this:

 
ID  NAME   STATUS  OKR_T.x  OKR_N.x  NR.x  OKR_T.y  OKR_N.y   NR.y OKR_T.z  OKR_N.z  NR.z
1   Jack     1      34      OK1       0     433       OK2      0     12       OK3     1
2   Bill     2      34      OK1       1
3   Steve    1     433      OK2       1      34       OK1      0     45       OK4     0      

I put the .x ; .y and .z to the end of the column names because if i remember correctly this is how R names columns with the same name. It's not important to me exactly how the duplicate columns are named as long as i can tell the difference that what OKR_T ; OKR_N and NR came from the same original row.

I have limited expirience in R and so far i have yoused dcast to reshape my data from long to wide but i can't get it to work with it. Since the logic is different i think that i need another way to do it.

Thank you for your help :)


Solution

  • library(tidyverse)
    df <- read.table(
      text = "ID  NAME   STATUS  OKR_T  OKR_N  NR
    1   Jack     1      34     OK1    0
    1   Jack     1     433     OK2    0
    1   Jack     1      12     OK3    1
    2   Bill     2      34     OK1    1
    3   Steve    1     433     OK2    1
    3   Steve    1      34     OK1    0
    3   Steve    1      45     OK4    0",
      header = T
    )
    df %>%
      group_by(ID) %>%
      mutate(rid = row_number()) %>%
      pivot_wider(
        id_cols = c(ID, NAME, STATUS),
        names_from = rid,
        values_from = c(OKR_T, OKR_N, NR)
      )
    #> # A tibble: 3 x 12
    #> # Groups:   ID [3]
    #>      ID NAME  STATUS OKR_T_1 OKR_T_2 OKR_T_3 OKR_N_1 OKR_N_2 OKR_N_3  NR_1  NR_2
    #>   <int> <chr>  <int>   <int>   <int>   <int> <chr>   <chr>   <chr>   <int> <int>
    #> 1     1 Jack       1      34     433      12 OK1     OK2     OK3         0     0
    #> 2     2 Bill       2      34      NA      NA OK1     <NA>    <NA>        1    NA
    #> 3     3 Steve      1     433      34      45 OK2     OK1     OK4         1     0
    #> # ... with 1 more variable: NR_3 <int>
    

    Created on 2021-09-08 by the reprex package (v2.0.1)