I have a dataset that looks like this.
# create data frame example
df1 <- data.frame("ckdate" = c("2016-04-28 22:30:00", "2016-05-14 12:00:00", "
2016-05-15 02:00:00", "2016-05-16 00:00:00", "2016-05-17 03:30:00"),
"lab_value" = c(12, 35, 9, 55, 90),
"pid" = c(1, 1, 2, 2, 2)
)
The goal here is to normailze the date times into 12 hour blocks and then calculate the average "lab_value" within those blocks across patients. Ultimately, so I can make a plot that shows the average lab value at each 12 hour time point. Here is what I have coded up so far
normalize_dates <- function(df){
df <- df %>%
mutate(ckdate = as.POSIXct(ckdate, format = "%Y-%m-%d %H:%M:%S")) %>%
mutate(hours_since_first = as.numeric(difftime(ckdate, min(ckdate), units = "hours"))) %>%
mutate(hours_since_first = hours_since_first %/% 12)
return(df)
}
df2 <- normalize_dates(df1) # This returns NAs for all differences
There are a number of missing (NA values). So the first thing I am wondering is if difftime
can handle NA, and if so is there and na.omit
option? Or is there a different way to calculate these differences?
Then for calculating the means I have
# finally, for each 12 hour block, take the average lab_value accross pids
df1 %>%
group_by(hours_since_first) %>%
summarize(mean_lab_value = mean(lab_value))
But I can't really test this yet because I can't get the 12 hour blocks. I work a lot in R, but have very little experience working with date times, so any help here would be greatly appreciated.
You have a space in your df1$ckdate
file , which blows up all your values.
your input creates, note the \n
:
ckdate lab_value pid
1 2016-04-28 22:30:00 12 1
2 2016-05-14 12:00:00 35 1
3 \n2016-05-15 02:00:00 9 2
4 2016-05-16 00:00:00 55 2
5 2016-05-17 03:30:00 90 2
cleanup your data like so:
df1 <- data.frame(ckdate = c("2016-04-28 22:30:00", "2016-05-14 12:00:00", "2016-05-15 02:00:00", "2016-05-16 00:00:00", "2016-05-17 03:30:00"),
lab_value = c(12, 35, 9, 55, 90),
pid = c(1, 1, 2, 2, 2)
)
and the function will work
norm_dates <- function(df){
df %>%
mutate(ckdate = as.POSIXct(ckdate, format = "%Y-%m-%d %H:%M:%S"),
dif = as.numeric(difftime(ckdate, min(ckdate), units = 'hours')) %/% 12)
}
norm_dates(df1)