Search code examples
dplyrconditional-operator

Mutate by a column element based on a condition


I am trying to mutate a column such that:

  1. The dataframe is grouped according to bug
  2. The mutated column (treatmentToGrowthRatio) is equal to:

value / (value such that Treatment == "Growth").

What I have:

df <- tibble(
  type = as.factor(c("bug1", "bug1", "bug1", "bug2", "bug2", "bug2", "bug3", "bug3", "bug3", "blank")),
  treatment = c(rep(c("TreatA", "TreatB", "Growth"),3), "Blank"),
  value = 1:10
)

The operation I'm effectively after:

df %>% group_by(bug) %>% 
  mutate(
    treatmentToGrowthRatio =
      value/
      ## value where treatment == growth 
      ## (i.e. for bug 1 = 3; for bug 2 = 6; for bug 3 = 9; for Blank = NA)
  )

To give the desired output:

dfFinal <- tibble(
  type = as.factor(c("bug1", "bug1", "bug1", "bug2", "bug2", "bug2", "bug3", "bug3", "bug3", "blank")),
  treatment = c(rep(c("TreatA", "TreatB", "Growth"),3), "Blank"),
  value = 1:10,
  treatmentToGrowthRatio = c(1/3, 2/3, 1, 4/6, 5/6, 1, 7/9, 8/9, 1, NA)
)

The closest I've gotten gives treatmentToGrowthRatio = 1 where Treatment == "Growth" from:

df %>% group_by(type) %>% 
  mutate(
    treatmentToGrowthRatio =
      value/
      case_when(
        str_detect(treatment,
                   "Growth") ~ value
      )
  )

Appreciate any insights! Thanks.


Solution

  • Your problem statement is not clear. There is no treatment == "Growth" for group type == "blank". What do you expect to happen in that case?

    Noting my comment above, I will ignore the row with type == "blank". Then you either do

    library(dplyr)
    df %>%
        filter(type != "blank") %>%
        group_by(type) %>%
        mutate(treatmentToGrowthRatio = value / value[treatment == "Growth"]) %>%
        ungroup()
    ## A tibble: 9 x 4
    #  type  treatment value treatmentToGrowthRatio
    #  <fct> <chr>     <int>                  <dbl>
    #1 bug1  TreatA        1                  0.333
    #2 bug1  TreatB        2                  0.667
    #3 bug1  Growth        3                  1    
    #4 bug2  TreatA        4                  0.667
    #5 bug2  TreatB        5                  0.833
    #6 bug2  Growth        6                  1    
    #7 bug3  TreatA        7                  0.778
    #8 bug3  TreatB        8                  0.889
    #9 bug3  Growth        9                  1     
    

    Or (perhaps more elegantly) reshape from long to wide, and then divide values from relevant columns.

    library(dplyr)
    library(tidyr)
    df %>%
        pivot_wider(names_from = treatment) %>%
        mutate(across(starts_with("Treat"), ~ .x / Growth))
    ## A tibble: 4 x 5
    #  type  TreatA TreatB Growth Blank
    #  <fct>  <dbl>  <dbl>  <int> <int>
    #1 bug1   0.333  0.667      3    NA
    #2 bug2   0.667  0.833      6    NA
    #3 bug3   0.778  0.889      9    NA
    #4 blank NA     NA         NA    10
    

    Then reshape again if necessary.