Search code examples
rdplyrmultiple-columnsmutatewide-format-data

Mutate value of a range of columns if columns name meets another column value


I have a wide df with columns representing the months of many given years and the changes of colour in each month:

df <- data.frame(id = as.integer(c(123,124,125,126)),
                 no_change = as.character(c("May.2010", NA, NA, "Sep.2010")),
                 `Jan.2010` = as.character(c("green", "black", "pink", "grey")),
                 `Feb.2010` = as.character(c("green", "black", "pink", "grey")),
                 `Mar.2010` = as.character(c("green", "red", "pink", "grey")),
                 `Apr.2010` = as.character(c("green", "red", "pink", "grey")),
                 `May.2010` = as.character(c("green", "red", "pink", "grey")),
                 `Jun.2010` = as.character(c("green", "red", "pink", "grey")),
                 `Jul.2010` = as.character(c("green", "white", "pink", "grey")),
                 `Ago.2010` = as.character(c("red", "white", "pink", "grey")),
                 `Sep.2010` = as.character(c("red", "white", "pink", "grey")),
                 `Oct.2010` = as.character(c("red", "white", "pink", "grey")),
                 `Nov.2010` = as.character(c("red", "white", "pink", "grey")),
                 `Dez.2010` = as.character(c("red", "white", "grey", "blue"))
                 )
df     
   id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
1 123  May.2010    green    green    green    green    green    green    green      red      red      red      red      red
2 124      <NA>    black    black      red      red      red      red    white    white    white    white    white    white
3 125      <NA>     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     grey
4 126  Sep.2010     grey     grey     grey     grey     grey     grey     grey     grey     grey     grey     grey     blue     

I want to apply NA to each column that contains a month equal to and above that specified in column 'no_change'. This is the desired output:

   id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
1 123  May.2010    green    green    green    green       NA       NA       NA       NA       NA       NA       NA       NA
2 124      <NA>    black    black      red      red      red      red    white    white    white    white    white    white
3 125      <NA>     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     grey
4 126  Sep.2010     grey     grey     grey     grey     grey     grey     grey     grey       NA       NA       NA       NA

Solution

  • You can pivot the format into a "long" format, and find out which rows should be turned into NA.

    library(tidyverse)
    
    df %>% 
      pivot_longer(ends_with("2010")) %>% 
      group_by(id) %>% 
      mutate(value = ifelse(cumsum(name == no_change & !is.na(no_change)), NA, value)) %>% 
      pivot_wider() %>% 
      ungroup()
    
    # A tibble: 4 × 14
         id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
      <int> <chr>     <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>   
    1   123 May.2010  green    green    green    green    NA       NA       NA       NA       NA       NA       NA       NA      
    2   124 NA        black    black    red      red      red      red      white    white    white    white    white    white   
    3   125 NA        pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     grey    
    4   126 Sep.2010  grey     grey     grey     grey     grey     grey     grey     grey     NA       NA       NA       NA