Search code examples
rdplyrpurrr

map inside a mutate exploding number of rows in output tibble


Say I have data like this:

d <- tibble::tribble(
  ~sit_comfy_sofa_1, ~sit_comfy_sofa_2, ~sit_comfy_sofa_3, ~sit_comfy_sofa_4, ~sit_comfy_couch_1, ~sit_comfy_couch_2, ~sit_comfy_couch_3, ~sit_comfy_couch_4, ~sit_comfy_settee_1, ~sit_comfy_settee_2, ~sit_comfy_settee_3, ~sit_comfy_settee_4,
                 1L,                0L,                0L,                0L,                 0L,                 1L,                 0L,                 0L,                  0L,                  0L,                  1L,                  0L,
                 0L,                0L,                0L,                1L,                 0L,                 0L,                 0L,                 1L,                  0L,                  1L,                  0L,                  0L,
                 0L,                1L,                0L,                0L,                 1L,                 0L,                 0L,                 0L,                  1L,                  0L,                  0L,                  0L,
                 0L,                0L,                1L,                0L,                 0L,                 0L,                 1L,                 0L,                  0L,                  0L,                  0L,                  1L
  )

This tibble has three 'categories' of columns, one for _sofa_, one for _couch_, and one for _settee_. I'm trying to look across each category, and construct a new variable that has a conditional value based on whether each of the columns within a category == 1.

I wrote this function to attempt that:

cleaning_fcn <- function(.df, .x){
  .df %>% 
    mutate(!!sym(paste0("explain_", .x)) := case_when(
      !!sym(paste0("sit_comfy_", .x ,"_1")) == 1 ~ "Just better",
      !!sym(paste0("sit_comfy_", .x, "_2")) == 1 ~ "Nice shape",
      !!sym(paste0("sit_comfy_", .x ,"_3")) == 1 ~ "Like the color",
      !!sym(paste0("sit_comfy_", .x ,"_4")) == 1 ~ "Nice material"),
      !!sym(paste0("explain_", .x)) := factor(!!sym(paste0("explain_", .x)), 
                                               levels = c("Just better", "Nice shape",
                                                          "Like the color", "Nice material")))
}

However, when I call it I end up with a tibble that has 3x as many rows the original tibble.

require(tidyverse)

purrr::map_dfr(
    .x = tidyselect::all_of(c("sofa", "couch", "settee")),
    .f = ~ cleaning_fcn(.df = d, .x))

Can anyone see where i'm going wrong?

Essentially, I want to achieve the same as the code below but ideally it'd be a function (and just generally with a lot less repetition):

d <- d %>% 
  mutate(explain_sofa = case_when(
    sit_comfy_sofa_1 == 1 ~ "Just better",
    sit_comfy_sofa_2 == 1 ~ "Nice shape",
    sit_comfy_sofa_3 == 1 ~ "Like the color",
    sit_comfy_sofa_4 == 1 ~ "Nice material"),
    explain_sofa = factor(explain_sofa, levels = c("Just better", "Nice shape",
                                                   "Like the color", "Nice material")))
d <- d %>% 
  mutate(explain_couch = case_when(
    sit_couch_sofa_1 == 1 ~ "Just better",
    sit_couch_sofa_2 == 1 ~ "Nice shape",
    sit_couch_sofa_3 == 1 ~ "Like the color",
    sit_couch_sofa_4 == 1 ~ "Nice material"),
    explain_couch = factor(explain_couch, levels = c("Just better", "Nice shape",
                                                   "Like the color", "Nice material")))

d <- d %>% 
  mutate(explain_settee = case_when(
    sit_settee_sofa_1 == 1 ~ "Just better",
    sit_settee_sofa_2 == 1 ~ "Nice shape",
    sit_settee_sofa_3 == 1 ~ "Like the color",
    sit_settee_sofa_4 == 1 ~ "Nice material"),
    explain_settee = factor(explain_settee, levels = c("Just better", "Nice shape",
                                                    "Like the color", "Nice material")))

Solution

  • Using map_dfr you are creating a list of dataframes, one for each of your categories, which is then bind by rows. Hence you end up with a dataframe with 3 times the number of rows. One option would be to use purrr::reduce instead:

    library(tidyverse)
    
    purrr::reduce(.x = c("sofa", "couch", "settee"), .f = cleaning_fcn, .init = d)
    #> # A tibble: 4 × 15
    #>   sit_comfy_sofa_1 sit_comfy_sofa_2 sit_comfy_sofa_3 sit_comfy_sofa_4
    #>              <int>            <int>            <int>            <int>
    #> 1                1                0                0                0
    #> 2                0                0                0                1
    #> 3                0                1                0                0
    #> 4                0                0                1                0
    #> # ℹ 11 more variables: sit_comfy_couch_1 <int>, sit_comfy_couch_2 <int>,
    #> #   sit_comfy_couch_3 <int>, sit_comfy_couch_4 <int>, sit_comfy_settee_1 <int>,
    #> #   sit_comfy_settee_2 <int>, sit_comfy_settee_3 <int>,
    #> #   sit_comfy_settee_4 <int>, explain_sofa <fct>, explain_couch <fct>,
    #> #   explain_settee <fct>