Search code examples
rdataframedplyrsample

Add a value only to certain cases


I have a dataframe:

x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
y <- c(2, 2, 2, 0, 0, 0, 0, 0, 2, 2,  2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
df <- data.frame(x, y)

Now i want to change values in x, but only for 10 % of all values in x when y equals 2. For example

set.seed(999)
df[sample(which(df$y == 2), round(0.1 * length(which(df$y == 2)))), ]

     x y
 11 11 2
 14 14 2

For exactly this cases I want to add + 1000. The result should look like:

     x    y
 1   1    2
 2   2    2
 3   3    2
 4   4    0
 5   5    0
 6   6    0
 7   7    0
 8   8    0
 9   9    2
 10 10    2
 11 1011  2
 12 12    2
 13 13    2
 14 1014  2
 15 15    2
 16 16    2
 17 17    2
 18 18    2
 19 19    2
 20 20    2

I am able to edit the sub-sample, but i dont know how to add the result to the dataframe "df" on a neat way. I am grateful for any help!


Solution

  • One simple way using base R could be

    #Get indices when y = 2
    inds <- df$y == 2
    
    #set.seed(123)
    #Get random indices whose value you need to change
    inds_to_change <- sample(which(inds), round(0.1 * sum(inds)))
    
    #Change the value
    df$x[inds_to_change] <- df$x[inds_to_change] + 1000
    
    df
    #      x y
    #1     1 2
    #2     2 2
    #3     3 2
    #4     4 0
    #5     5 0
    #6     6 0
    #7     7 0
    #8     8 0
    #9     9 2
    #10 1010 2
    #11   11 2
    #12   12 2
    #13   13 2
    #14   14 2
    #15   15 2
    #16   16 2
    #17 1017 2
    #18   18 2
    #19   19 2
    #20   20 2