I basically do reporting on marketing data and am trying work on some automation in R.
In excel we can drag and select columns into the "Rows" area of a pivot table to summarise the data. I have a similar situation where I would like to group or summarise 2 out of the 3 columns of my dataset, where the 3rd column is the key which differentiates the other two.
In detail: in the below dataset I have 3 columns(with dummy data) naming, Placement ID, Date and Device. Here we see that a Placement ID is repeating on the same day 3 times but on different Devices.
Placement ID Date Device
12456362 31-08-2018 Mobile
12456363 30-08-2018 Desktop
12456364 25-08-2018 Tablet
12456362 31-08-2018 Tablet
12456363 30-08-2018 Desktop
12456364 25-08-2018 Mobile
12456362 31-08-2018 Desktop
12456363 30-08-2018 Mobile
12456364 25-08-2018 Tablet
12456362 24-08-2018 Tablet
12456363 12-08-2018 Desktop
12456364 10-08-2018 Mobile
12456362 19-08-2018 Desktop
12456363 25-08-2018 Mobile
12456364 25-08-2018 Tablet
12456362 10-08-2018 Mobile
12456363 19-08-2018 Desktop
12456364 30-08-2018 Tablet
Now I have reference file where I have to fill cost for each placement ID. The reference file has only two columns, Date and Placement ID where there are no duplicates, i.e for each Placement ID there are different dates and the respective cost associated with it. Here is an example:
Placement ID Date Cost
12456362 10-08-2018 1200
12456362 19-08-2018 1300
12456362 24-08-2018 1400
12456362 31-08-2018 1500
12456363 12-08-2018 1600
12456363 19-08-2018 1700
12456363 25-08-2018 1800
12456363 30-08-2018 1900
12456364 10-08-2018 2000
12456364 25-08-2018 2100
12456364 30-08-2018 2200
Now in excel I would have simply created the pivot table and have had both date and placement ID dragged under the 'Rows' field to have it all aggregated, here is the desired output:
Placement ID Date
12456362 10-08-2018
12456362 19-08-2018
12456362 24-08-2018
12456362 31-08-2018
12456363 12-08-2018
12456363 19-08-2018
12456363 25-08-2018
12456363 30-08-2018
12456364 10-08-2018
12456364 25-08-2018
12456364 30-08-2018
I would like to achieve the above result in R. I have tried and melt and cast function, however I am not associating the 2 columns with any values when I melt the dataset, which i cannot do, so df_melt is returning more rows than the original count of df. Here is the code:
df_Melt <- melt(_df, id.vars=c("Date","Placement ID"))
Thanks in advance.
# example data
df = read.table(text = "
PlacementID Date Device
12456362 31-08-2018 Mobile
12456363 30-08-2018 Desktop
12456364 25-08-2018 Tablet
12456362 31-08-2018 Tablet
12456363 30-08-2018 Desktop
12456364 25-08-2018 Mobile
12456362 31-08-2018 Desktop
12456363 30-08-2018 Mobile
12456364 25-08-2018 Tablet
12456362 24-08-2018 Tablet
12456363 12-08-2018 Desktop
12456364 10-08-2018 Mobile
12456362 19-08-2018 Desktop
12456363 25-08-2018 Mobile
12456364 25-08-2018 Tablet
12456362 10-08-2018 Mobile
12456363 19-08-2018 Desktop
12456364 30-08-2018 Tablet
", header=T, stringsAsFactors=T)
library(dplyr)
library(lubridate)
df %>%
distinct(PlacementID, Date) %>% # get distinct combinations of those columns
arrange(PlacementID, dmy(Date)) # order by those columns
and that returns:
# PlacementID Date
# 1 12456362 10-08-2018
# 2 12456362 19-08-2018
# 3 12456362 24-08-2018
# 4 12456362 31-08-2018
# 5 12456363 12-08-2018
# 6 12456363 19-08-2018
# 7 12456363 25-08-2018
# 8 12456363 30-08-2018
# 9 12456364 10-08-2018
# 10 12456364 25-08-2018
# 11 12456364 30-08-2018
Note that if you already have a date format for column Date
you don't need to use lubridate
and function ymd
.