I have a large data set that I need to reshape/melt. The problem that I have is that the first and second row are variable names (i.e., the first row is the id of a person and the second row lists four attributes for this person). In addition to this the first column captures the dates when the values were collected. To see a sample of my data set please take a look at Sheet1 in the following google sheet (https://docs.google.com/spreadsheets/d/19b_4hB6aM9JXReG67i9EF_sJVDHkNFHf4iShH8yrfOc/edit?usp=sharing). I would like to reshape my data to look like the Sheet2 in the above google sheet.
Is there a way of specifying that there are two IDs in my melt command. The first id is the first row (starting at the second column) and the second id is the first column (starting at the second row). If I could write a pseudo-R script for what I am trying to do I would write something like this.
melt(dt, id=c("Dates from A2:A6", "Person from B1:I1")
Thanks!
The following approach will do your job,
library(tidyverse)
name <- readr::read_csv("csvs1.csv", col_names = F, n_max = 2)
name
# A tibble: 2 x 9
X1 X2 X3 X4 X5 X6 X7 X8 X9
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 NA person_A person_A person_A person_A person_B person_B person_B person_B
2 Dates var1 var2 var3 var4 var1 var2 var3 var4
nm <- paste(names[1,], names[2,], sep = "-")
nm
[1] "NA-Dates" "person_A-var1" "person_A-var2" "person_A-var3" "person_A-var4" "person_B-var1" "person_B-var2"
[8] "person_B-var3" "person_B-var4"
data <- readr::read_csv("csvs1.csv", col_names = F, skip = 2)
names(data) <- nm
data
# A tibble: 4 x 9
`NA-Dates` `person_A-var1` `person_A-var2` `person_A-var3` `person_A-var4` `person_B-var1` `person_B-var2`
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-05-01 45.0 43.9 67.9 35.7 40.5 38.6
2 2021-05-02 50.4 52.6 67.9 41.9 41.7 43.9
3 2021-05-03 53.7 49.1 67.9 49.6 41.3 38.6
4 2021-05-04 53.7 52.6 73.2 45.7 29.8 33.3
# ... with 2 more variables: person_B-var3 <dbl>, person_B-var4 <dbl>
data %>%
+ pivot_longer(!`NA-Dates`, names_to = c('person', '.value'), names_sep = '-', names_prefix = 'person_')
# A tibble: 8 x 6
`NA-Dates` person var1 var2 var3 var4
<date> <chr> <dbl> <dbl> <dbl> <dbl>
1 2021-05-01 A 45.0 43.9 67.9 35.7
2 2021-05-01 B 40.5 38.6 60.7 32.6
3 2021-05-02 A 50.4 52.6 67.9 41.9
4 2021-05-02 B 41.7 43.9 42.9 40.3
5 2021-05-03 A 53.7 49.1 67.9 49.6
6 2021-05-03 B 41.3 38.6 48.2 39.5
7 2021-05-04 A 53.7 52.6 73.2 45.7
8 2021-05-04 B 29.8 33.3 57.1 16.3