I am trying to create a new data frame in R using an existing data frame of items bought in transactions as shown below:
dput output for the data:
structure(list(Transaction = c(1L, 2L, 2L, 3L, 3L, 3L), Item = c("Bread",
"Scandinavian", "Scandinavian", "Hot chocolate", "Jam", "Cookies"
), date_time = c("30/10/2016 09:58", "30/10/2016 10:05", "30/10/2016 10:05",
"30/10/2016 10:07", "30/10/2016 10:07", "30/10/2016 10:07"),
period_day = c("morning", "morning", "morning", "morning",
"morning", "morning"), weekday_weekend = c("weekend", "weekend",
"weekend", "weekend", "weekend", "weekend"), Year = c("2016",
"2016", "2016", "2016", "2016", "2016"), Month = c("October",
"October", "October", "October", "October", "October")), row.names = c(NA,
6L), class = "data.frame")
As you can see in the example, the rows are due to each individual product bought, not the transactions themselves (hence why Transaction 2 is both rows 2 and 3).
I would like to make a new table where the rows are the different transactions (1, 2, 3, etc.) and the different columns are categorical (Bread = 0, 1) so I can perform apriori analysis.
Any idea how I can group the different transactions together and then create these new columns?
Assuming your dataframe is called df
you can use tidyr
's pivot_wider
:
df1 <- tidyr::pivot_wider(df, names_from = Item, values_from = Item,
values_fn = n_distinct, values_fill = 0)
df1
# Transaction date_time period_day weekday_weekend Year Month Bread Scandinavian `Hot chocolate` Jam Cookies
# <int> <chr> <chr> <chr> <chr> <chr> <int> <int> <int> <int> <int>
#1 1 30/10/2016 09… morning weekend 2016 Octob… 1 0 0 0 0
#2 2 30/10/2016 10… morning weekend 2016 Octob… 0 1 0 0 0
#3 3 30/10/2016 10… morning weekend 2016 Octob… 0 0 1 1 1
Or with data.table's dcast
:
library(data.table)
dcast(setDT(df), Transaction+date_time+period_day + weekday_weekend +
Year + Month ~ Item, value.var = 'Item', fun.aggregate = uniqueN)