Search code examples
rtransposedcast

how to calculate unique count using dcast in R


I'm using dcast to transpose the following table

date               event          user_id
25-07-2020         Create          3455
25-07-2020         Visit           3567
25-07-2020         Visit           3567
25-07-2020         Add             3567
25-07-2020         Add             3678
25-07-2020         Add             3678
25-07-2020         Create          3567
24-07-2020         Edit            3871

I'm using dcast to transpose to have my events as columns and count user_id

dae_summ <- dcast(ahoy_events, date ~ event, value.var="user_id")

But I'm not getting unique user id's. its counting the same user_id multiple times. What can I do to get one user_id counted only one time for the same date and event.


Solution

  • We could use uniqueN from data.table

    library(data.table)
    dcast(setDT(ahoy_events), date ~ event, fun.aggregate = uniqueN)
    #         date Add Create Edit Visit
    #1: 24-07-2020   0      0    1     0
    #2: 25-07-2020   2      2    0     1
    

    Or using pivot_wider from tidyr with values_fn specified as n_distinct

    library(tidyr)
    library(dplyr)
    ahoy_events %>%
       pivot_wider(names_from = event, values_from = user_id, 
          values_fn = list(user_id = n_distinct), values_fill = list(user_id = 0))
    # A tibble: 2 x 5
    #   date       Create Visit   Add  Edit
    #  <chr>       <int> <int> <int> <int>
    #1 25-07-2020      2     1     2     0
    #2 24-07-2020      0     0     0     1
    

    data

    ahoy_events <- structure(list(date = c("25-07-2020", "25-07-2020", "25-07-2020", 
    "25-07-2020", "25-07-2020", "25-07-2020", "25-07-2020", "24-07-2020"
    ), event = c("Create", "Visit", "Visit", "Add", "Add", "Add", 
    "Create", "Edit"), user_id = c(3455L, 3567L, 3567L, 3567L, 3678L, 
    3678L, 3567L, 3871L)), class = "data.frame", row.names = c(NA, 
    -8L))