Search code examples
rlapplyreshapereshape2melt

How to reshape the data in R , apply top date row as column and group by id in


Hi i have the data like this

date    2020/06/10  2020/06/10  2020/06/10  2020/06/11  2020/06/11  2020/06/11  
id          x           y           z           x           y           z           
10432       0           0           0           0           0           0           
10668       0           0           0           0           0           0           
11088       0           0           0           0           0           0   

And i want my output like this

id          date    x   y   z
10432   2020/06/10  0   0   0
10432   2020/06/11  0   0   0
10668   2020/06/10  0   0   0
10668   2020/06/11  0   0   0
11088   2020/06/10  0   0   0
11088   2020/06/11  0   0   0

I would like to get the output in R


Solution

  • This data is extremely messy, but with some work I have managed to get your desired output.


    This is what the sample data you provided looks like after I imported it using read_table2() from the readr package:

    library(readr) # 1.3.1
    
    df <- read_table2("date    2020/06/10  2020/06/10  2020/06/10  2020/06/11  2020/06/11  2020/06/11  
    id          x           y           z           x           y           z           
    10432       0           0           0           0           0           0           
    10668       0           0           0           0           0           0           
    11088       0           0           0           0           0           0 ")
    
    df[, 8] <- NULL
    
    > df
    # A tibble: 4 x 7
      date  `2020/06/10` `2020/06/10_1` `2020/06/10_2` `2020/06/11` `2020/06/11_1` `2020/06/11_2`
      <chr> <chr>        <chr>          <chr>          <chr>        <chr>          <chr>         
    1 id    x            y              z              x            y              z             
    2 10432 0            0              0              0            0              0             
    3 10668 0            0              0              0            0              0             
    4 11088 0            0              0              0            0              0   
    

    Note that in R you can't have repeating column names so it changes them slightly. We have to resolve this. (You might not have to do these steps exactly, I cannot know precisely what your columns are called without you providing the data with dput().)

    Now I paste together the current column names with the 'id' row, and then tidy the column names so that we can work with them later on:

    names(df) <- c("id", paste(df[1, -1], names(df)[-1], sep = "_"))
    
    library(stringr) # 1.4.0
    
    names(df)[-1] <- str_remove(names(df)[-1], "_[1-9]$") # this gets rid of
    # the "_1" or "_2" R added automatically when I imported the data
    

    We now no longer need the first row as its contents have become repeated in the column names.

    df <- df[-1,]
    
    > df
    # A tibble: 3 x 7
      id    `x_2020/06/10` `y_2020/06/10` `z_2020/06/10` `x_2020/06/11` `y_2020/06/11` `z_2020/06/11`
      <chr> <chr>          <chr>          <chr>          <chr>          <chr>          <chr>         
    1 10432 0              0              0              0              0              0             
    2 10668 0              0              0              0              0              0             
    3 11088 0              0              0              0              0              0 
    

    Now we can actually work on this df and go from wide to long format:

    library(tidyr) # 1.1.0
    
    df %>% pivot_longer(-id, names_to = c(".value", "date"),
                          names_pattern = "(.)_(..../../..)")
    
    # A tibble: 6 x 5
      id    date       x     y     z    
      <chr> <chr>      <chr> <chr> <chr>
    1 10432 2020/06/10 0     0     0    
    2 10432 2020/06/11 0     0     0    
    3 10668 2020/06/10 0     0     0    
    4 10668 2020/06/11 0     0     0    
    5 11088 2020/06/10 0     0     0    
    6 11088 2020/06/11 0     0     0  
    

    So there we have it - a tidying data solution using the readr, stringr and tidyr packages from tidyverse - a very nice example, I think, of what tidyverse was precisely developed for! :)


    DATA

    Should you need it, here is the tidy df that we have just before we implement pivot_longer() on it:

    structure(list(id = c("10432", "10668", "11088"), `x_2020/06/10` = c("0", 
    "0", "0"), `y_2020/06/10` = c("0", "0", "0"), `z_2020/06/10` = c("0", 
    "0", "0"), `x_2020/06/11` = c("0", "0", "0"), `y_2020/06/11` = c("0", 
    "0", "0"), `z_2020/06/11` = c("0", "0", "0")), row.names = c(NA, 
    -3L), class = c("tbl_df", "tbl", "data.frame"))