I am wondering if anyone has insight on how to approach this problem. I have a dataframe with the columns: Startime, Endtime and ID (dataframe below) I have already grouped the IDs. I am trying to figure out the time difference in seconds of a specific ‘grouped by’ ID (specifically A)
Starttime Endtime ID
12/18/2019 4:06:59 PM 12/18/2019 4:07:05 PM A
12/18/2019 4:07:26 PM 12/18/2019 4:07:28 PM A
12/17/2019 6:48:06 PM 12/17/2019 6:48:07 PM A
12/17/2019 6:25:16 PM 12/17/2019 6:25:22 PM A
I went about this by using the following code within R and utilizing the dplyr package:
data <-rawdata %>% group_by(ID) %>%
summarize(diff = difftime(last(as.POSIXct(Endtime, format ="%m/%d/%Y %I:%M:%S %p")),
first(as.POSIXct(Starttime, format = "%m/%d/%Y %I:%M:%S %p" )), units = "secs"))
However, the code above only calculates the difference between start and end time but not the differences in between. For instance, the code above calculates:
12/18/2019 4:06:59 PM and 12/17/2019 6:25:22 PM
I see this as inaccurate, because it doesn’t account for when 12/17 changes to 12/18, its simply taking the first and last value within the grouped ID.
Is there a way to subgroup data within already grouped data? (further group by time: month day and year and then take its difference in seconds?
Is there a way to do this? Further subgroup the data so that when the day changes, it will know to calculate this difference as well, versus only calculating the first and last values.
Starttime Endtime ID
12/18/2019 4:06:59PM 12/18/2019 4:07:05 PM A
12/18/2019 4:07:26PM 12/18/2019 4:07:28 PM A
12/17/2019 6:48:06PM 12/17/2019 6:48:07PM A
12/17/2019 6:25:16PM 12/17/2019 6:25:22PM A
Any suggestions will help! I feel like I am stuck on this. I will continue to research. Thank you!
******Edited for clarification******
I am trying to calculate my email durations. In this example, I have already grouped by IDs.
I wish to calculate my email read time duration for the ID 'A'. The code I am currently using just calculates its start and end time in seconds.
data <-rawdata %>% group_by(ID) %>% summarize(diff = difftime(last(as.POSIXct(Endtime, format = "%m/%d/%Y %I:%M:%S %p")),first(as.POSIXct(Starttime, format = "%m/%d/%Y %I:%M:%S %p" )), units = "secs"))
However, I think this is not accurate. Overall, I am wanting the time difference by each row for a more accurate reading for me to calculate my total email read time. The output would be:
Starttime Endtime
12/18/2019 4:06:59PM 12/18/2019 4:07:05 PM A 6 secs
12/18/2019 4:07:26PM 12/18/2019 4:07:28 PM A 1 secs
12/17/2019 6:48:06PM 12/17/2019 6:48:07PM A 1 sec
12/17/2019 6:25:16PM 12/17/2019 6:25:22PM A 6 secs
If you want a difference for each row, then you don't want to summarize for each ID but to mutate without grouping and remove the first
and last
function calls.
Therefore, your code should look like this:
data <- rawdata %>%
mutate(diff = difftime(as.POSIXct(Endtime, format = "%m/%d/%Y %I:%M:%S %p"),as.POSIXct(Starttime, format = "%m/%d/%Y %I:%M:%S %p" ), units = "secs"))
But, as you are using dates, you may want to use the lubridate
package, which is more flexible than base R
:
library(lubridate)
data <- rawdata %>%
mutate_at(vars(Endtime, Starttime), as.POSIXct, format = "%m/%d/%Y %I:%M:%S %p") %>%
mutate(diff = interval(Endtime, Starttime)/seconds(1))
Then, if you want the total, you can group by ID
and summarize the sum:
data %>%
group_by(ID) %>%
summarize(total=sum(diff))
For your next question, try to use the dput
function on your data to share a reproducible example, as I stated in my comment. This is much easier for a helper if he/she can just copy-paste your code and run it.