Search code examples
rpivotpivot-tabletidyrspread

Error in running a spread because of unique 'key combinations'; combining rows of data


I'm looking to do a group_by on Google Analytics data where I have a unique user identifier, URL the person went to, and then the number of times the user went to that page.

The data come from Google Analytics like this:

ID          Page                  Pageviews
abc123      example.com/pagea     2 
qwer123     example.com/pageb     3 
abc123      example.com/pageb     4
qwer123     example.com/pagec     5 
uiop123     example.com/pagea     6

I'm trying to flip it into

ID        example.com/pagea    example.com/pageb    example.com/pagec
abc123    2                    4                    0
qwer123   0                    3                    5
uiop123   6                    0                    0  

However, when I use spread, I get an error: Error: Each row of output must be identified by a unique combination of keys.

The command I'm running is: df <- data %>% spread(Page, Pageviews, fill = 0)

Here's where I think I'm causing the issue: Before I do the spread, I am removing some data from the URLs to normalize the URLs (basically removing query strings). So before I do the spread, I think I need to consolidate where I have the same ID and Page and then add the combined Pageviews, so rather than a two rows, I now have 1.

Basically, I think I need to go to the first part of the data and turn instances of :

ID          Page                  Pageviews
abc123      example.com/pagea     2 
abc123      example.com/pagea     3 

into

ID          Page                  Pageviews
abc123      example.com/pagea     5 

What's the least painful way to do that?


Solution

  • Use dplyr first :

    library(dplyr)
    library(tidyr)
    df <- data %>% group_by(ID,Page) %>%
    summarise(Pageviews = sum(Pageviews,na.rm=T) %>%
    spread(Page, Pageviews, fill = 0)