Search code examples
rtime-seriescut

Time series breaks not even on the half hour for an imported dataset


I have been working on some analysis and have been grouping data into half hourly average groups for convenience as the data resolution is very fine (every 2 mins for a couple months).

Data are imported as this:

unq_id    dat_tim      sens_hgt  leaf_temp_c  
   1    5/18/17 10:22      2      29.82043 
   2    5/18/17 10:24      2      32.27954 
   3    5/18/17 10:26      2      32.48996 
   4    5/18/17 10:28      2      31.81604 
   5    5/18/17 10:30      2      31.56943

The issue is that when I add a half hourly break class, the breaks are on a half hourly increment based off the first date-time measurement. Code used:

leaf_temp_df <- read.csv("leaf_master.csv",header = TRUE, sep = ",")
leaf_temp_df$halfhour <- cut(as.POSIXct(paste(leaf_temp_df$dat_tim),
                                        format = "%m/%d/%y %H:%M"), breaks = "30 min")  

output:

unq_id    dat_tim     sens_hgt  leaf_temp_c        halfhour
   1   5/18/17 10:22      2      29.82043     2017-05-18 10:22:00
   2   5/18/17 10:24      2      32.27954     2017-05-18 10:22:00
   3   5/18/17 10:26      2      32.48996     2017-05-18 10:22:00
   4   5/18/17 10:28      2      31.81604     2017-05-18 10:22:00
   5   5/18/17 10:30      2      31.56943     2017-05-18 10:22:00

The output follows that pattern until it reaches the next break at 10:52:00

I'd like the halfhour vector to be even on the half hour (e.g. 10:30:00 and 11:00:00) to enable compatibility between different data types. To fix this I tried to skip the first four lines of data to make unq_id = 5 the first line read as it begins on 10:30.

leaf_temp_df <- read.csv("leaf_master.csv", header = TRUE, sep = ",")[-c(1:4),]

This still presents the same issue of the half hour breaks starting at 10:22. I've even tried to edit the master datafile and deleting lines 1-4 from the file to enable the breaks to read the first date time at an even half hour (10:30) but the 10:22 issue still presents itself.


Solution

  • I prefer lubridate::floor_date for this:

    library(lubridate)
    df$dat_tim <- mdy_hm(df$dat_tim)
    df$halfhour <- floor_date(df$dat_tim, "30 minutes")
    
    
       unq_id             dat_tim sens_hgt leaf_temp_c            halfhour
    1 5/18/17 2017-05-18 10:22:00        2    29.82043 2017-05-18 10:00:00
    2 5/18/17 2017-05-18 10:24:00        2    32.27954 2017-05-18 10:00:00
    3 5/18/17 2017-05-18 10:26:00        2    32.48996 2017-05-18 10:00:00
    4 5/18/17 2017-05-18 10:28:00        2    31.81604 2017-05-18 10:00:00
    5 5/18/17 2017-05-18 10:30:00        2    31.56943 2017-05-18 10:30:00