Search code examples
rdplyrmutate

Mutate column for n% of the total dataframe, in descending order of other column r


I have a df

df <- data.frame(ID = c(1, 2, 3, 4, 5, 5, 7, 8),
                  var1 = c('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'),
                  var2 = c(1, 1, 0, 0, 1, 1, 0, 0),
                 var3 = c(50, 40, 30, 45, 33, 51, 70, 46))

I would like to modify var2 to '0.3' for 25% of the dataframe using:

df %>%
  mutate(var2 = case_when(sample(n()) <= n()*0.25 ~ 0.3, 
    TRUE ~ var2
  ))

However, I would like the 25% of data to be selected by descending order of var3 so that the output is:

  ID var1 var2 var3
1  1    a    1   50
2  2    b    1   40
3  3    c    0   30
4  4    d    0   45
5  5    e    1   33
6  5    f  0.3   51
7  7    g  0.3   70
8  8    h    0   46

Where row IDS 6 & 7 have been modified as these have the highest and second highest value for Var3. It should work so that I can vary the % of mutations but that they are always applied in descending order of Var3.

Thank you in advance


Solution

  • Solution using arrange, then returning to the previous ordering.

    df %>%
      mutate(Row = row_number()) %>%
      arrange(desc(var3)) %>%
      mutate(Magnitude_index = row_number(),
             var2 = if_else(Magnitude_index <= n() * 0.25, 0.3, var2)
             ) %>%
      arrange(Row) %>%
      select(any_of(names(df)))
    
      ID var1 var2 var3
    1  1    a  1.0   50
    2  2    b  1.0   40
    3  3    c  0.0   30
    4  4    d  0.0   45
    5  5    e  1.0   33
    6  5    f  0.3   51
    7  7    g  0.3   70
    8  8    h  0.0   46