Search code examples
rdplyrtidyrreshape2

R: Possible to add summarise filed in dcast?


R: Possible to add summarise filed in dcast? User vs Number of order by month. and in between yearly summary add in it.

Check attached screenshot(excel's) for expected result.

User            Order_date          

a               02-01-2017          
b               02-02-2017          
a               02-08-2017          
c               02-05-2017          
a               02-08-2017          
s               02-06-2017          
c               02-03-2017          
s               02-04-2017          
b               02-06-2017          
c               02-11-2017          
a               02-11-2017          
s               02-11-2017          
c               02-01-2018          
s               02-01-2018          
b               02-02-2018          
b               02-10-2018          

Try to generate report but not displayed summary values which is displayed in excel, Please check the code and attached screenshot.

library(data.table)
library(lubridate)

df$start_year_month <- format(df$Month_Due, "%Y-%m")
#dcast(setDT(df), user ~ factor(start_year_month, levels = 1:12), sum, drop = FALSE)

datatable(dcast(df, user ~ start_year_month), filter = 'top',fun.aggregate =  ???? )

Click here for Screenshot


Solution

  • You can first count for each month, the number of different users by doing:

    library(tidyr)
    library(dplyr)
    library(lubridate)
    df <- df %>% mutate(Order_date = dmy(Order_date)) # Format the date from the reproducible example
    
    DF <- df %>% arrange(Order_date) %>% mutate(months = format(Order_date, "%b_%Y")) %>%
      mutate(months = factor(months, unique(months))) %>%
      group_by(months, User) %>% count() 
    
    # A tibble: 15 x 3
    # Groups:   months, User [15]
       months   User      n
       <fct>    <chr> <int>
     1 Jan_2017 a         1
     2 Feb_2017 b         1
     3 Mar_2017 c         1
     4 Apr_2017 s         1
     5 May_2017 c         1
     6 Jun_2017 b         1
     7 Jun_2017 s         1
     8 Aug_2017 a         2
     9 Nov_2017 a         1
    10 Nov_2017 c         1
    11 Nov_2017 s         1
    12 Jan_2018 c         1
    13 Jan_2018 s         1
    14 Feb_2018 b         1
    15 Oct_2018 b         1
    

    Then, you can create second dataframe with the count per Year:

    DF_Year <- df %>% arrange(Order_date) %>% mutate(months = paste(format(Order_date, "%Y"),"_Total",sep = "")) %>%
      mutate(months = factor(months, unique(months))) %>%
      group_by(months, User) %>% count() 
    
    # A tibble: 7 x 3
    # Groups:   months, User [7]
      months User      n
      <fct>  <chr> <int>
    1 2017   a         4
    2 2017   b         2
    3 2017   c         3
    4 2017   s         3
    5 2018   b         2
    6 2018   c         1
    7 2018   s         1
    

    You can bind both dataframe :

    DF_ALL <- bind_rows(DF, DF_Year)
    

    Finally, you can pivot your dataframe into a larger format and order columns based on the year:

    DF_Final <- DF_ALL %>% pivot_wider(names_from = months, values_from = n) %>%
      select(contains("2017"),contains("2018")) 
    
    # A tibble: 4 x 14
    # Groups:   User [4]
      User  Jan_2017 Feb_2017 Mar_2017 Apr_2017 May_2017 Jun_2017 Aug_2017 Nov_2017 `2017_Total` Jan_2018 Feb_2018 Oct_2018
      <chr>    <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>        <int>    <int>    <int>    <int>
    1 a            1       NA       NA       NA       NA       NA        2        1            4       NA       NA       NA
    2 b           NA        1       NA       NA       NA        1       NA       NA            2       NA        1        1
    3 c           NA       NA        1       NA        1       NA       NA        1            3        1       NA       NA
    4 s           NA       NA       NA        1       NA        1       NA        1            3        1       NA       NA
    # … with 1 more variable: `2018_Total` <int>
    

    Does it answer your question ?


    Reproducible example

    structure(list(User = c("a", "b", "a", "c", "a", "s", "c", "s", 
    "b", "c", "a", "s", "c", "s", "b", "b"), Order_date = structure(c(17168, 
    17199, 17380, 17288, 17380, 17319, 17227, 17258, 17319, 17472, 
    17472, 17472, 17533, 17533, 17564, 17806), class = "Date")), class = "data.frame", row.names = c(NA, 
    -16L))