Search code examples
rdplyrutc

How to group by timestamp in UTC by day in R


So I have this sample of UTC timestamps and a bunch of other data. I would like to group my data by date. This means I do not need hours/mins/secs and would like to have a new df which shows the number of actions grouped together.

I tried using lubridate to pull out the date but I cant get the origin right.

DATA

hw0 <- read.table(text = 
'ID   timestamp        action
4f.. 20160305195246   visitPage
75.. 20160305195302   visitPage
77.. 20160305195312   checkin
42.. 20160305195322   checkin
8f.. 20160305195332   searchResultPage
29.. 20160305195342   checkin', header = T)

Here's what I tried

library(dplyr)
library(lubridate) #this will allow us to extract the date
daily <- hw0 %>%
mutate(date=date(as.POSIXct(timestamp),origin='1970-01-01'))

daily <- daily %>%
group_by(date)

I am unsure what to use as an origin and my error says this value is incorrect. Ultimately, I expect the code to return a new df which features a variable (date) with a list of unique dates as well as how many of the different actions there are in each day.


Solution

  • Assuming the numbers at the end are 24 hour time based, you can use:

    daily = hw0 %>% 
      mutate(date = as.POSIXct(as.character(timestamp), format = '%Y%m%d%H%M%S'))
    

    You can use as.Date instead if you want to get rid of the hour times. You need to supply the origin when you give a numeric argument, which is interpreted as the number of days since the origin. In your case you should just give it a character vector and supply the date format.