Search code examples
rrandomdownsampling

randomly sampling of dataset to decrease the values in the dataset


I am currently trying to decrease the values in a column randomly according to a given sum. For example, if the main data is like this;

ID Value

1 4
2 10
3 16

after running the code the sum of Value should be 10 and this need to be done randomly(the decrease for each member should be chosen randomly)

ID Value

1 1
2 8
3 1

Tried several command and library but could not manage it. Still a novice and Any help would be appreciated!

Thanks

Edit: Sorry I was not clear enough. I would like to assign a new value for each observation smaller than original (randomly). And at the end new sum of value will be equal to 10


Solution

  • Using the sample data

    dd <- read.table(text="ID Value
    1 4
    2 10
    3 16", header=TRUE)
    

    and the dplyr + tidyr library, you can do

    library(dplyr)
    library(tidyr)
    
    dd %>% 
      mutate(ID=factor(ID)) %>% 
      uncount(Value) %>%
      sample_n(10) %>% 
      count(ID, name = "Value", .drop=FALSE)
    

    Here we repeat the row once for each Value, then we randomly sample 10 rows, then we count them back up. We turn ID to a factor to make sure IDs with 0 observations are preserved.