Search code examples
rdplyrpivot-tablemeltdcast

R Pivot table with columns converted to only rows


I basically do reporting on marketing data and am trying work on some automation in R.

In excel we can drag and select columns into the "Rows" area of a pivot table to summarise the data. I have a similar situation where I would like to group or summarise 2 out of the 3 columns of my dataset, where the 3rd column is the key which differentiates the other two.

In detail: in the below dataset I have 3 columns(with dummy data) naming, Placement ID, Date and Device. Here we see that a Placement ID is repeating on the same day 3 times but on different Devices.

Placement ID      Date  Device
12456362    31-08-2018  Mobile 
12456363    30-08-2018  Desktop
12456364    25-08-2018  Tablet
12456362    31-08-2018  Tablet
12456363    30-08-2018  Desktop
12456364    25-08-2018  Mobile 
12456362    31-08-2018  Desktop
12456363    30-08-2018  Mobile 
12456364    25-08-2018  Tablet
12456362    24-08-2018  Tablet
12456363    12-08-2018  Desktop
12456364    10-08-2018  Mobile 
12456362    19-08-2018  Desktop
12456363    25-08-2018  Mobile 
12456364    25-08-2018  Tablet
12456362    10-08-2018  Mobile 
12456363    19-08-2018  Desktop
12456364    30-08-2018  Tablet

Now I have reference file where I have to fill cost for each placement ID. The reference file has only two columns, Date and Placement ID where there are no duplicates, i.e for each Placement ID there are different dates and the respective cost associated with it. Here is an example:

Placement ID      Date  Cost
12456362    10-08-2018  1200
12456362    19-08-2018  1300
12456362    24-08-2018  1400
12456362    31-08-2018  1500
12456363    12-08-2018  1600
12456363    19-08-2018  1700
12456363    25-08-2018  1800
12456363    30-08-2018  1900
12456364    10-08-2018  2000
12456364    25-08-2018  2100
12456364    30-08-2018  2200

Now in excel I would have simply created the pivot table and have had both date and placement ID dragged under the 'Rows' field to have it all aggregated, here is the desired output:

Placement ID      Date 
12456362    10-08-2018
12456362    19-08-2018
12456362    24-08-2018
12456362    31-08-2018
12456363    12-08-2018
12456363    19-08-2018
12456363    25-08-2018
12456363    30-08-2018
12456364    10-08-2018
12456364    25-08-2018
12456364    30-08-2018

I would like to achieve the above result in R. I have tried and melt and cast function, however I am not associating the 2 columns with any values when I melt the dataset, which i cannot do, so df_melt is returning more rows than the original count of df. Here is the code:

df_Melt <- melt(_df, id.vars=c("Date","Placement ID"))

Thanks in advance.


Solution

  • # example data
    df = read.table(text = "
    PlacementID      Date  Device
    12456362    31-08-2018  Mobile 
    12456363    30-08-2018  Desktop
    12456364    25-08-2018  Tablet
    12456362    31-08-2018  Tablet
    12456363    30-08-2018  Desktop
    12456364    25-08-2018  Mobile 
    12456362    31-08-2018  Desktop
    12456363    30-08-2018  Mobile 
    12456364    25-08-2018  Tablet
    12456362    24-08-2018  Tablet
    12456363    12-08-2018  Desktop
    12456364    10-08-2018  Mobile 
    12456362    19-08-2018  Desktop
    12456363    25-08-2018  Mobile 
    12456364    25-08-2018  Tablet
    12456362    10-08-2018  Mobile 
    12456363    19-08-2018  Desktop
    12456364    30-08-2018  Tablet
    ", header=T, stringsAsFactors=T)
    
    library(dplyr)
    library(lubridate)
    
    df %>%
      distinct(PlacementID, Date) %>% # get distinct combinations of those columns
      arrange(PlacementID, dmy(Date)) # order by those columns
    

    and that returns:

    #    PlacementID       Date
    # 1     12456362 10-08-2018
    # 2     12456362 19-08-2018
    # 3     12456362 24-08-2018
    # 4     12456362 31-08-2018
    # 5     12456363 12-08-2018
    # 6     12456363 19-08-2018
    # 7     12456363 25-08-2018
    # 8     12456363 30-08-2018
    # 9     12456364 10-08-2018
    # 10    12456364 25-08-2018
    # 11    12456364 30-08-2018
    

    Note that if you already have a date format for column Date you don't need to use lubridate and function ymd.