I have the following data frame, which records the names of individuals, their jobs, and how many hours they worked on a given day:
test_df <- data.frame(Name = c("Mark", "Stacy", "Anthony", "Colette"),
Job = c("Bartender", "Bartender", "Host", "Server"),
"01-Jan" = c(4, 5, 0, 6),
"03-Jan" = c(3, 7, 7, 8),
"04-Jan" = c(8, 0, 5, 4),
"07-Jan" = c(5, 6, 6, 7),
"08-Jan" = c(6, 8, 4, 0))
The above data frame only has one row corresponding to the name of each person, and then has a column for each day, which records how many hours a person worked in that particular day. I want to switch this, so that each person's name appears in 5 rows (corresponding to the 5 days represented in the data frame), and then there is "Date" column and an "Hours" column, showing how many hours a person worked and on what day, one row at a time. Like so:
test_df_2 <- data.frame(Name = c("Mark", "Mark", "Mark", "Mark", "Mark",
"Stacy", "Stacy", "Stacy", "Stacy", "Stacy",
"Anthony", "Anthony", "Anthony", "Anthony", "Anthony",
"Colette", "Colette", "Colette", "Colette", "Colette"),
Job = c("Bartender", "Bartender", "Bartender", "Bartender", "Bartender",
"Bartender", "Bartender", "Bartender", "Bartender", "Bartender",
"Host", "Host", "Host", "Host", "Host",
"Server", "Server", "Server", "Server", "Server"),
Date = c("01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020",
"01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020",
"01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020",
"01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020"),
Hours = c(4, 3, 8, 5, 6,
5, 7, 0, 6, 8,
0, 7, 5, 6, 4,
6, 8, 4, 7, 0))
I want to write a script in R to turn the first data frame into the second, but I'm unsure how. I've looked around at other stack overflow answers regarding combining multiple data frame columns into one, but I can't see how to get the rows to populate the right number of times, with the corresponding dates.
You could use tidyr::pivot_longer():
library(tidyr)
pivot_longer(test_df, 3:7, names_to="Date", values_to="Hours")
This will create the output structure you are looking for. You'll then only have to recode the values in the Date
column.
Another option would be using the reshape2::melt():
melt(test_df, id.vars=c("Name", "Job"), variable.name = "Date", value.name="Hours")
tidyr/pivot_longer appears to be the newer kid on the block, though.
This article explains the concepts at work quite well.