Search code examples
rdataframedatemultiple-columnsstacked

How to combine multiple data frame date columns into one stacked date column in R


I have the following data frame, which records the names of individuals, their jobs, and how many hours they worked on a given day:

test_df <- data.frame(Name = c("Mark", "Stacy", "Anthony", "Colette"),
                      Job = c("Bartender", "Bartender", "Host", "Server"),
                      "01-Jan" = c(4, 5, 0, 6),
                      "03-Jan" = c(3, 7, 7, 8),
                      "04-Jan" = c(8, 0, 5, 4),
                      "07-Jan" = c(5, 6, 6, 7),
                      "08-Jan" = c(6, 8, 4, 0))

The above data frame only has one row corresponding to the name of each person, and then has a column for each day, which records how many hours a person worked in that particular day. I want to switch this, so that each person's name appears in 5 rows (corresponding to the 5 days represented in the data frame), and then there is "Date" column and an "Hours" column, showing how many hours a person worked and on what day, one row at a time. Like so:

test_df_2 <- data.frame(Name = c("Mark", "Mark", "Mark", "Mark", "Mark",
                               "Stacy", "Stacy", "Stacy", "Stacy", "Stacy",
                               "Anthony", "Anthony", "Anthony", "Anthony", "Anthony",
                               "Colette", "Colette", "Colette", "Colette", "Colette"),
                      Job = c("Bartender", "Bartender", "Bartender", "Bartender", "Bartender",
                              "Bartender", "Bartender", "Bartender", "Bartender", "Bartender",
                              "Host", "Host", "Host", "Host", "Host",
                              "Server", "Server", "Server", "Server", "Server"),
                      Date = c("01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020",
                              "01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020",
                              "01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020",
                              "01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020"),
                      Hours = c(4, 3, 8, 5, 6,
                               5, 7, 0, 6, 8,
                               0, 7, 5, 6, 4,
                               6, 8, 4, 7, 0))

I want to write a script in R to turn the first data frame into the second, but I'm unsure how. I've looked around at other stack overflow answers regarding combining multiple data frame columns into one, but I can't see how to get the rows to populate the right number of times, with the corresponding dates.


Solution

  • You could use tidyr::pivot_longer():

    library(tidyr)
    
    pivot_longer(test_df, 3:7, names_to="Date", values_to="Hours")
    

    This will create the output structure you are looking for. You'll then only have to recode the values in the Date column.

    Another option would be using the reshape2::melt():

    melt(test_df, id.vars=c("Name", "Job"), variable.name = "Date", value.name="Hours")
    

    tidyr/pivot_longer appears to be the newer kid on the block, though.

    This article explains the concepts at work quite well.