Given this dataframe:
library(tidyverse)
library(openxlsx)
df.1 <- tribble(
~customer ,~period, ~cost1, ~cost2 , ~prod,
'cust1', '202201', 5, 10, 'online',
'cust1', '202202', 5, 10, 'online',
'cust1', '202203', 5, 10, 'in-person',
'cust1', '202204', 5, 10, 'in-person',
'cust2', '202203', 5, 10,'online',
'cust2', '202204', 5, 10, 'in-person',
'cust2', '202202', 5, 10,'online',
'cust3', '202204', 5, 10,'online'
'cust4', '202101', NA, NA, 'online',
'cust4', '202102', NA,10, 'online'
)
I am trying to create an excel file for each customer and dynamically add tabs based on the variable prod
, where some files may have 1 tab and others more than 1 based upon how many products they have.
I would also like to remove any column where all values are NA
so in this simple example, the file for cust4 would not have a column cost1
.
Here is what I tried. It seems to split the file by customer and product, but it doesn't- create a file for each customer and it doesn't remove the column where all values are NA
list(tab1 = df.1) %>%
map(\(df) split(df, list(df.1$customer, df.1$prod), drop =T)) %>%
janitor::remove_empty(., which = 'cols') %>%
iwalk(\(df, nm) write.xlsx(df, paste0(nm, ".xlsx"), startRow = 6))
How can I accomplish this? I would expect 4 files with a tab for each product they actually have, and in cust4's case only 4 columns instead of 5
Here is one approach to achieve your desired result, which
customer
iwalk
to loop over the list of customer dataframesprod
library(tidyverse)
library(openxlsx)
df.1 %>%
split(.$customer) %>%
iwalk(\(df, nm) {
df %>%
janitor::remove_empty(which = 'cols') %>%
split(.$prod) |>
write.xlsx(paste0(nm, ".xlsx"), startRow = 6)
})
read.xlsx("cust4.xlsx", "online")
#> customer period cost2 prod
#> 1 cust4 202101 NA online
#> 2 cust4 202102 10 online
And thanks to the comment by @MrFlick we could achieve the same result using the base pipe |>
like so:
df.1 |>
split(~customer) %>%
iwalk(\(df, nm) {
df |>
janitor::remove_empty(which = 'cols') |>
split(~prod) |>
write.xlsx(paste0(nm, ".xlsx"), startRow = 6)
})
EDIT You could remove empty cols per product by first splitting by prod
and by using map
to get rid of the empty cols like so:
df.1 |>
split(~customer) %>%
iwalk(\(df, nm) {
df |>
split(~prod) |>
map(~janitor::remove_empty(.x, which = 'cols')) |>
write.xlsx(paste0(nm, ".xlsx"), startRow = 6)
})