Search code examples
rdataframesummary

R create summary percentages in a dataframe using cells that will become different denominators


Here is my dataframe. It's not very long - only six rows.

enter image description here

df <- structure(list(Send_Month = c("2021-05", "2021-06", "2021-07", 
"2021-05", "2021-06", "2021-07"), Order_Result = c("No", "No", 
"No", "Yes", "Yes", "Yes"), Email_Send = c(135, 495, 475, 7, 
28, 25), Unique_Email_Opens = c(45, 149, 143, 7, 28, 25), Unique_Email_Clicks = c(6, 
21, 10, 7, 28, 25), Total_Orders = c(37, 106, 46, 7, 28, 25)), row.names = c(NA, 
-6L), groups = structure(list(Send_Month = c("2021-05", "2021-06", 
"2021-07"), .rows = structure(list(c(1L, 4L), c(2L, 5L), c(3L, 
6L)), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
"list"))), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"))

I'm having trouble picturing how I'm going to get summary results that I can graph into a bar chart. I'm trying to make some groupings here:

When the month is the same (for example, "2021-05") and I look at the Email_Send variable, I can see that 7 out of 142 (which is 135+7) emails that were sent led to an order. I can also see that 7 out of 52 (which is 45+7) emails that were opened led to an order. And 7 out of 13 (which is 6+7) emails that were clicked on led to an order. That is for the "2021-05" group.

How can I create these statistics for each grouping so I can see how the percentage would change for each group, where the denominator keeps shifting?

I tried using the janitor package for a second and just to orient myself I first filtered to only include that 2021-05 group:

df_may <- df %>%
  filter(Send_Month == "2021-05")

df_may %>%
  adorn_totals("row")

enter image description here

But I don't know if this method is very flexible for looking at all the groups together and also I don't know if I really want a summary row or a new column. So I don't know if I'm heading in the right direction here.


Solution

  • Thank you dear @ThomasIsCoding for an excellent tip of using proporstions function in place of .x/sum(.x).

    library(dplyr)
    library(purrr)
    
    df %>%
      group_by(Send_Month, .add = TRUE) %>%
      group_split() %>%
      map(~ .x %>% 
            mutate(across(!c(1, 2), ~ proportions(.x))))
    
    [[1]]
    # A tibble: 2 x 6
      Send_Month Order_Result Email_Send Unique_Email_Opens Unique_Email_Clicks Total_Orders
      <chr>      <chr>             <dbl>              <dbl>               <dbl>        <dbl>
    1 2021-05    No               0.951               0.865               0.462        0.841
    2 2021-05    Yes              0.0493              0.135               0.538        0.159
    
    [[2]]
    # A tibble: 2 x 6
      Send_Month Order_Result Email_Send Unique_Email_Opens Unique_Email_Clicks Total_Orders
      <chr>      <chr>             <dbl>              <dbl>               <dbl>        <dbl>
    1 2021-06    No               0.946               0.842               0.429        0.791
    2 2021-06    Yes              0.0535              0.158               0.571        0.209
    
    [[3]]
    # A tibble: 2 x 6
      Send_Month Order_Result Email_Send Unique_Email_Opens Unique_Email_Clicks Total_Orders
      <chr>      <chr>             <dbl>              <dbl>               <dbl>        <dbl>
    1 2021-07    No                 0.95              0.851               0.286        0.648
    2 2021-07    Yes                0.05              0.149               0.714        0.352