Say I have data like this:
d <- tibble::tribble(
~sit_comfy_sofa_1, ~sit_comfy_sofa_2, ~sit_comfy_sofa_3, ~sit_comfy_sofa_4, ~sit_comfy_couch_1, ~sit_comfy_couch_2, ~sit_comfy_couch_3, ~sit_comfy_couch_4, ~sit_comfy_settee_1, ~sit_comfy_settee_2, ~sit_comfy_settee_3, ~sit_comfy_settee_4,
1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L
)
This tibble has three 'categories' of columns, one for _sofa_
, one for _couch_
, and one for _settee_
. I'm trying to look across each category, and construct a new variable that has a conditional value based on whether each of the columns within a category == 1.
I wrote this function to attempt that:
cleaning_fcn <- function(.df, .x){
.df %>%
mutate(!!sym(paste0("explain_", .x)) := case_when(
!!sym(paste0("sit_comfy_", .x ,"_1")) == 1 ~ "Just better",
!!sym(paste0("sit_comfy_", .x, "_2")) == 1 ~ "Nice shape",
!!sym(paste0("sit_comfy_", .x ,"_3")) == 1 ~ "Like the color",
!!sym(paste0("sit_comfy_", .x ,"_4")) == 1 ~ "Nice material"),
!!sym(paste0("explain_", .x)) := factor(!!sym(paste0("explain_", .x)),
levels = c("Just better", "Nice shape",
"Like the color", "Nice material")))
}
However, when I call it I end up with a tibble that has 3x as many rows the original tibble.
require(tidyverse)
purrr::map_dfr(
.x = tidyselect::all_of(c("sofa", "couch", "settee")),
.f = ~ cleaning_fcn(.df = d, .x))
Can anyone see where i'm going wrong?
Essentially, I want to achieve the same as the code below but ideally it'd be a function (and just generally with a lot less repetition):
d <- d %>%
mutate(explain_sofa = case_when(
sit_comfy_sofa_1 == 1 ~ "Just better",
sit_comfy_sofa_2 == 1 ~ "Nice shape",
sit_comfy_sofa_3 == 1 ~ "Like the color",
sit_comfy_sofa_4 == 1 ~ "Nice material"),
explain_sofa = factor(explain_sofa, levels = c("Just better", "Nice shape",
"Like the color", "Nice material")))
d <- d %>%
mutate(explain_couch = case_when(
sit_couch_sofa_1 == 1 ~ "Just better",
sit_couch_sofa_2 == 1 ~ "Nice shape",
sit_couch_sofa_3 == 1 ~ "Like the color",
sit_couch_sofa_4 == 1 ~ "Nice material"),
explain_couch = factor(explain_couch, levels = c("Just better", "Nice shape",
"Like the color", "Nice material")))
d <- d %>%
mutate(explain_settee = case_when(
sit_settee_sofa_1 == 1 ~ "Just better",
sit_settee_sofa_2 == 1 ~ "Nice shape",
sit_settee_sofa_3 == 1 ~ "Like the color",
sit_settee_sofa_4 == 1 ~ "Nice material"),
explain_settee = factor(explain_settee, levels = c("Just better", "Nice shape",
"Like the color", "Nice material")))
Using map_dfr
you are creating a list
of dataframes, one for each of your categories, which is then bind by rows. Hence you end up with a dataframe with 3 times the number of rows. One option would be to use purrr::reduce
instead:
library(tidyverse)
purrr::reduce(.x = c("sofa", "couch", "settee"), .f = cleaning_fcn, .init = d)
#> # A tibble: 4 × 15
#> sit_comfy_sofa_1 sit_comfy_sofa_2 sit_comfy_sofa_3 sit_comfy_sofa_4
#> <int> <int> <int> <int>
#> 1 1 0 0 0
#> 2 0 0 0 1
#> 3 0 1 0 0
#> 4 0 0 1 0
#> # ℹ 11 more variables: sit_comfy_couch_1 <int>, sit_comfy_couch_2 <int>,
#> # sit_comfy_couch_3 <int>, sit_comfy_couch_4 <int>, sit_comfy_settee_1 <int>,
#> # sit_comfy_settee_2 <int>, sit_comfy_settee_3 <int>,
#> # sit_comfy_settee_4 <int>, explain_sofa <fct>, explain_couch <fct>,
#> # explain_settee <fct>