Search code examples
rreshape2melt

Reshape/Melt data with two rows of variable names


I have a large data set that I need to reshape/melt. The problem that I have is that the first and second row are variable names (i.e., the first row is the id of a person and the second row lists four attributes for this person). In addition to this the first column captures the dates when the values were collected. To see a sample of my data set please take a look at Sheet1 in the following google sheet (https://docs.google.com/spreadsheets/d/19b_4hB6aM9JXReG67i9EF_sJVDHkNFHf4iShH8yrfOc/edit?usp=sharing). I would like to reshape my data to look like the Sheet2 in the above google sheet.

Is there a way of specifying that there are two IDs in my melt command. The first id is the first row (starting at the second column) and the second id is the first column (starting at the second row). If I could write a pseudo-R script for what I am trying to do I would write something like this.

melt(dt, id=c("Dates from A2:A6", "Person from B1:I1")

Thanks!


Solution

  • The following approach will do your job,

    library(tidyverse)
    name <- readr::read_csv("csvs1.csv", col_names = F, n_max = 2)
    
    name
    # A tibble: 2 x 9
      X1    X2       X3       X4       X5       X6       X7       X8       X9      
      <chr> <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>   
    1 NA    person_A person_A person_A person_A person_B person_B person_B person_B
    2 Dates var1     var2     var3     var4     var1     var2     var3     var4    
    
    nm <- paste(names[1,], names[2,], sep = "-")
    nm
    [1] "NA-Dates"      "person_A-var1" "person_A-var2" "person_A-var3" "person_A-var4" "person_B-var1" "person_B-var2"
    [8] "person_B-var3" "person_B-var4"
    
    data <- readr::read_csv("csvs1.csv", col_names = F, skip = 2)
    
    names(data) <- nm
    data
    # A tibble: 4 x 9
      `NA-Dates` `person_A-var1` `person_A-var2` `person_A-var3` `person_A-var4` `person_B-var1` `person_B-var2`
      <date>               <dbl>           <dbl>           <dbl>           <dbl>           <dbl>           <dbl>
    1 2021-05-01            45.0            43.9            67.9            35.7            40.5            38.6
    2 2021-05-02            50.4            52.6            67.9            41.9            41.7            43.9
    3 2021-05-03            53.7            49.1            67.9            49.6            41.3            38.6
    4 2021-05-04            53.7            52.6            73.2            45.7            29.8            33.3
    # ... with 2 more variables: person_B-var3 <dbl>, person_B-var4 <dbl>
    
    data %>% 
    +   pivot_longer(!`NA-Dates`, names_to = c('person', '.value'), names_sep = '-', names_prefix = 'person_')
    
    # A tibble: 8 x 6
      `NA-Dates` person  var1  var2  var3  var4
      <date>     <chr>  <dbl> <dbl> <dbl> <dbl>
    1 2021-05-01 A       45.0  43.9  67.9  35.7
    2 2021-05-01 B       40.5  38.6  60.7  32.6
    3 2021-05-02 A       50.4  52.6  67.9  41.9
    4 2021-05-02 B       41.7  43.9  42.9  40.3
    5 2021-05-03 A       53.7  49.1  67.9  49.6
    6 2021-05-03 B       41.3  38.6  48.2  39.5
    7 2021-05-04 A       53.7  52.6  73.2  45.7
    8 2021-05-04 B       29.8  33.3  57.1  16.3