I'm looking to do a group_by on Google Analytics data where I have a unique user identifier, URL the person went to, and then the number of times the user went to that page.
The data come from Google Analytics like this:
ID Page Pageviews
abc123 example.com/pagea 2
qwer123 example.com/pageb 3
abc123 example.com/pageb 4
qwer123 example.com/pagec 5
uiop123 example.com/pagea 6
I'm trying to flip it into
ID example.com/pagea example.com/pageb example.com/pagec
abc123 2 4 0
qwer123 0 3 5
uiop123 6 0 0
However, when I use spread, I get an error: Error: Each row of output must be identified by a unique combination of keys
.
The command I'm running is:
df <- data %>% spread(Page, Pageviews, fill = 0)
Here's where I think I'm causing the issue: Before I do the spread, I am removing some data from the URLs to normalize the URLs (basically removing query strings). So before I do the spread, I think I need to consolidate where I have the same ID and Page and then add the combined Pageviews, so rather than a two rows, I now have 1.
Basically, I think I need to go to the first part of the data and turn instances of :
ID Page Pageviews
abc123 example.com/pagea 2
abc123 example.com/pagea 3
into
ID Page Pageviews
abc123 example.com/pagea 5
What's the least painful way to do that?
Use dplyr first :
library(dplyr)
library(tidyr)
df <- data %>% group_by(ID,Page) %>%
summarise(Pageviews = sum(Pageviews,na.rm=T) %>%
spread(Page, Pageviews, fill = 0)