So I have this sample of UTC timestamps and a bunch of other data. I would like to group my data by date. This means I do not need hours/mins/secs and would like to have a new df which shows the number of actions grouped together.
I tried using lubridate to pull out the date but I cant get the origin right.
DATA
hw0 <- read.table(text =
'ID timestamp action
4f.. 20160305195246 visitPage
75.. 20160305195302 visitPage
77.. 20160305195312 checkin
42.. 20160305195322 checkin
8f.. 20160305195332 searchResultPage
29.. 20160305195342 checkin', header = T)
Here's what I tried
library(dplyr)
library(lubridate) #this will allow us to extract the date
daily <- hw0 %>%
mutate(date=date(as.POSIXct(timestamp),origin='1970-01-01'))
daily <- daily %>%
group_by(date)
I am unsure what to use as an origin and my error says this value is incorrect. Ultimately, I expect the code to return a new df which features a variable (date) with a list of unique dates as well as how many of the different actions there are in each day.
Assuming the numbers at the end are 24 hour time based, you can use:
daily = hw0 %>%
mutate(date = as.POSIXct(as.character(timestamp), format = '%Y%m%d%H%M%S'))
You can use as.Date
instead if you want to get rid of the hour times. You need to supply the origin when you give a numeric argument, which is interpreted as the number of days since the origin. In your case you should just give it a character vector and supply the date format.