Search code examples
rtime-seriesscalenormalize

Apply Scale function for every 24 hour data period


I have several days of heart rate data for every second of the day (with random missing gaps of data) like this:

structure(list(TimePoint = structure(c(1523237795, 1523237796, 
                                       1523237797, 1523237798, 1523237799, 1523237800, 1523237801, 1523237802, 
                                       1523237803, 1523237804), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
               HR = c(80L, 83L, 87L, 91L, 95L, 99L, 102L, 104L, 104L, 103L
               )), row.names = c(NA, 10L), class = "data.frame")

------------------------------
            TimePoint  HR
1  2018-04-09 01:36:35  80
2  2018-04-09 01:36:36  83
3  2018-04-09 01:36:37  87
4  2018-04-09 01:36:38  91
5  2018-04-09 01:36:39  95
6  2018-04-09 01:36:40  99
7  2018-04-09 01:36:41 102
8  2018-04-09 01:36:42 104
9  2018-04-09 01:36:43 104
10 2018-04-09 01:36:44 103
.
.
.

I would like to apply the Scale(center = T, scale = T) function to the data to normalize across participants.

  • However, I don't want to normalize across entire days of available data, but across every 24 hour period
  • So if a participant has 3 days of data, the HR will be scaled to a z-distribution 3 separate times; each for it's respective day

I am having trouble doing this successfully.

  # read csv 
  DF = read.csv(x)
  # make sure date stamp is read YYYY Month Day & convert timestamp into class POSIXct
  x2 = as.POSIXct(DF[,1], format = '%d.%m.%Y %H:%M:%S', tz = "UTC") %>% data.frame()
  # rename column
  colnames(x2)[1] = "TimePoint"
  # add the participant HR data to this dataframe 
  x2$HR = DF[,2]
  # break time stamps into 60 minute windows
  by60 = cut(x2$TimePoint, breaks = "60 min")
  # get the average HR per 60 min window
  DF_Sum = aggregate(HR ~ by60, FUN=mean, data=x2)
  # add weekday /hours for future plot visualization 
  DF_Sum$WeekDay = wday(DF_Sum$by60, label = T)
  DF_Sum$Hour = hour(DF_Sum$by60)

I am able to split the data by timeseries and average the HR by hour but I cannot seem to add the scale function properly.

Help appreciated.


Solution

  • Create time intervals of 24 hours for each patient, group_by patient and time intervals, then calculate the scaled HR for each group.

    library(dplyr)
    df %>% 
       #remove the following mutate and replace ID in group_by by the ID's column name in your data set
       mutate(ID=1) %>% 
       group_by(ID, Int=cut(TimePoint, breaks="24 hours")) %>%  
       mutate(HR_sc=scale(HR, center = TRUE, scale = TRUE))
    
    # A tibble: 10 x 5
    # Groups:   ID, Int [1]
       TimePoint              HR    ID Int                   HR_sc
       <dttm>              <int> <dbl> <fct>                 <dbl>
     1 2018-04-09 01:26:35    80     1 2018-04-09 01:00:00 -1.63  
     2 2018-04-09 01:28:16    83     1 2018-04-09 01:00:00 -1.30  
     3 2018-04-09 01:29:57    87     1 2018-04-09 01:00:00 -0.860 
     4 2018-04-09 01:31:38    91     1 2018-04-09 01:00:00 -0.419 
     5 2018-04-09 01:33:19    95     1 2018-04-09 01:00:00  0.0221
     6 2018-04-09 01:33:20    99     1 2018-04-09 01:00:00  0.463 
     7 2018-04-09 01:35:01   102     1 2018-04-09 01:00:00  0.794 
     8 2018-04-09 01:36:42   104     1 2018-04-09 01:00:00  1.01  
     9 2018-04-09 01:38:23   104     1 2018-04-09 01:00:00  1.01  
    10 2018-04-09 01:39:59   103     1 2018-04-09 01:00:00  0.905