If I have a dataset that has scores from the same measure collected at different time points, how can I organize those dates/times so they represents a timepoint after a certain date? Is this possible to do in R or would it be easier for me to do this in another program?
I have a dataset that currently looks like this:
id date score1_date score1 score2_date score2 score3_date score3
101 1/6/2020 1/1/2020 20 1/8/2020 18 1/15/2020 16
102 2/27/2020 2/14/2020 16 2/21/2020 16 2/28/2020 10
103 1/10/2020 1/7/2020 30 1/14/2020 25 1/21/2020 20
104 3/5/2020 3/6/2020 40 3/13/2020 42 3/20/2020 40
I want to find the closest [score#_date] to [date] and identify that as [time1] and then have everything that follows as [time2], [time3], etc.
Here is the code for that above table:
structure(list(id = c(101, 102, 103, 104), date = structure(c(18267,
18319, 18271, 18326), class = "Date"), score1_date = structure(c(18262,
18306, 18268, 18327), class = "Date"), score1 = c(20, 16, 30,
40), score2_date = structure(c(18269, 18313, 18275, 18334), class = "Date"),
score2 = c(18, 16, 25, 42), score3_date = structure(c(18276,
18320, 18282, 18341), class = "Date"), score3 = c(16, 10,
20, 40)), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
So I would eventually want the dataset to have variables that look something like this:
id date time1_date time1_score time2_date time2_score time3_date time3_score
101 1/6/2020 1/8/2020 18 1/15/2020 16 NA NA
102 2/27/2020 2/28/2020 10 NA NA NA NA
103 1/10/2020 1/7/2020 30 1/14/2020 25 1/21/2020 20
104 3/5/2020 3/6/2020 40 3/13/2020 42 3/20/2020 40
Thank you so much!
Using tidyverse
functions you can do :
library(dplyr)
library(tidyr)
df %>%
#Rename date column to base_date
rename(base_date = date) %>%
#Rename score1, score2 etc to score1_value, score2_value etc
rename_with(~paste0(., '_value'), matches('^score\\d+$')) %>%
#get the data in long format with date and value as two columns
pivot_longer(cols = starts_with('score'),
names_to = c('score', '.value'),
names_sep = '_') %>%
group_by(id) %>%
#Keep only those date where the date is greater than closest date
filter(date >= date[which.min(abs(date - base_date))]) %>%
#Arrange the data
arrange(id, date) %>%
#Create new column name
mutate(score = paste0('time', row_number())) %>%
ungroup %>%
#Get the data in wide format
pivot_wider(names_from = score, values_from = c(date, value)) %>%
#Arrange the columns
select(id, base_date, order(suppressWarnings(readr::parse_number(names(.)))))
# id base_date date_time1 value_time1 date_time2 value_time2 date_time3 value_time3
# <dbl> <date> <date> <dbl> <date> <dbl> <date> <dbl>
#1 101 2020-01-06 2020-01-08 18 2020-01-15 16 NA NA
#2 102 2020-02-27 2020-02-28 10 NA NA NA NA
#3 103 2020-01-10 2020-01-07 30 2020-01-14 25 2020-01-21 20
#4 104 2020-03-05 2020-03-06 40 2020-03-13 42 2020-03-20 40