Search code examples
rtidyverse

Working with date-times to average values across records in [R]


I have a dataset that looks like this.

# create data frame example
df1 <- data.frame("ckdate" = c("2016-04-28 22:30:00", "2016-05-14 12:00:00", "  
2016-05-15 02:00:00", "2016-05-16 00:00:00", "2016-05-17 03:30:00"),
                  "lab_value" = c(12, 35, 9, 55, 90),
                  "pid" = c(1, 1, 2, 2, 2)
                  )

The goal here is to normailze the date times into 12 hour blocks and then calculate the average "lab_value" within those blocks across patients. Ultimately, so I can make a plot that shows the average lab value at each 12 hour time point. Here is what I have coded up so far

normalize_dates <- function(df){
  df <- df %>% 
    mutate(ckdate = as.POSIXct(ckdate, format = "%Y-%m-%d %H:%M:%S")) %>% 
    mutate(hours_since_first = as.numeric(difftime(ckdate, min(ckdate), units = "hours"))) %>% 
    mutate(hours_since_first = hours_since_first %/% 12)
  return(df)
}
df2 <- normalize_dates(df1) # This returns NAs for all differences

There are a number of missing (NA values). So the first thing I am wondering is if difftime can handle NA, and if so is there and na.omit option? Or is there a different way to calculate these differences?

Then for calculating the means I have

# finally, for each 12 hour block, take the average lab_value accross pids
df1 %>% 
  group_by(hours_since_first) %>% 
  summarize(mean_lab_value = mean(lab_value))

But I can't really test this yet because I can't get the 12 hour blocks. I work a lot in R, but have very little experience working with date times, so any help here would be greatly appreciated.


Solution

  • You have a space in your df1$ckdate file , which blows up all your values.

    your input creates, note the \n:

                       ckdate lab_value pid
    1     2016-04-28 22:30:00        12   1
    2     2016-05-14 12:00:00        35   1
    3   \n2016-05-15 02:00:00         9   2
    4     2016-05-16 00:00:00        55   2
    5     2016-05-17 03:30:00        90   2
    

    cleanup your data like so:

    df1 <- data.frame(ckdate = c("2016-04-28 22:30:00", "2016-05-14 12:00:00", "2016-05-15 02:00:00", "2016-05-16 00:00:00", "2016-05-17 03:30:00"),
                      lab_value = c(12, 35, 9, 55, 90),
                      pid = c(1, 1, 2, 2, 2)
    )
    

    and the function will work

    norm_dates <- function(df){
    df %>% 
      mutate(ckdate = as.POSIXct(ckdate, format = "%Y-%m-%d %H:%M:%S"),
             dif = as.numeric(difftime(ckdate, min(ckdate), units = 'hours')) %/% 12)
    }
    norm_dates(df1)