R: Possible to add summarise filed in dcast? User vs Number of order by month. and in between yearly summary add in it.
Check attached screenshot(excel's) for expected result.
User Order_date
a 02-01-2017
b 02-02-2017
a 02-08-2017
c 02-05-2017
a 02-08-2017
s 02-06-2017
c 02-03-2017
s 02-04-2017
b 02-06-2017
c 02-11-2017
a 02-11-2017
s 02-11-2017
c 02-01-2018
s 02-01-2018
b 02-02-2018
b 02-10-2018
Try to generate report but not displayed summary values which is displayed in excel, Please check the code and attached screenshot.
library(data.table)
library(lubridate)
df$start_year_month <- format(df$Month_Due, "%Y-%m")
#dcast(setDT(df), user ~ factor(start_year_month, levels = 1:12), sum, drop = FALSE)
datatable(dcast(df, user ~ start_year_month), filter = 'top',fun.aggregate = ???? )
You can first count for each month, the number of different users by doing:
library(tidyr)
library(dplyr)
library(lubridate)
df <- df %>% mutate(Order_date = dmy(Order_date)) # Format the date from the reproducible example
DF <- df %>% arrange(Order_date) %>% mutate(months = format(Order_date, "%b_%Y")) %>%
mutate(months = factor(months, unique(months))) %>%
group_by(months, User) %>% count()
# A tibble: 15 x 3
# Groups: months, User [15]
months User n
<fct> <chr> <int>
1 Jan_2017 a 1
2 Feb_2017 b 1
3 Mar_2017 c 1
4 Apr_2017 s 1
5 May_2017 c 1
6 Jun_2017 b 1
7 Jun_2017 s 1
8 Aug_2017 a 2
9 Nov_2017 a 1
10 Nov_2017 c 1
11 Nov_2017 s 1
12 Jan_2018 c 1
13 Jan_2018 s 1
14 Feb_2018 b 1
15 Oct_2018 b 1
Then, you can create second dataframe with the count per Year:
DF_Year <- df %>% arrange(Order_date) %>% mutate(months = paste(format(Order_date, "%Y"),"_Total",sep = "")) %>%
mutate(months = factor(months, unique(months))) %>%
group_by(months, User) %>% count()
# A tibble: 7 x 3
# Groups: months, User [7]
months User n
<fct> <chr> <int>
1 2017 a 4
2 2017 b 2
3 2017 c 3
4 2017 s 3
5 2018 b 2
6 2018 c 1
7 2018 s 1
You can bind both dataframe :
DF_ALL <- bind_rows(DF, DF_Year)
Finally, you can pivot your dataframe into a larger format and order columns based on the year:
DF_Final <- DF_ALL %>% pivot_wider(names_from = months, values_from = n) %>%
select(contains("2017"),contains("2018"))
# A tibble: 4 x 14
# Groups: User [4]
User Jan_2017 Feb_2017 Mar_2017 Apr_2017 May_2017 Jun_2017 Aug_2017 Nov_2017 `2017_Total` Jan_2018 Feb_2018 Oct_2018
<chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 a 1 NA NA NA NA NA 2 1 4 NA NA NA
2 b NA 1 NA NA NA 1 NA NA 2 NA 1 1
3 c NA NA 1 NA 1 NA NA 1 3 1 NA NA
4 s NA NA NA 1 NA 1 NA 1 3 1 NA NA
# … with 1 more variable: `2018_Total` <int>
Does it answer your question ?
Reproducible example
structure(list(User = c("a", "b", "a", "c", "a", "s", "c", "s",
"b", "c", "a", "s", "c", "s", "b", "b"), Order_date = structure(c(17168,
17199, 17380, 17288, 17380, 17319, 17227, 17258, 17319, 17472,
17472, 17472, 17533, 17533, 17564, 17806), class = "Date")), class = "data.frame", row.names = c(NA,
-16L))