Search code examples
rdataframesimulation

Simulate a data frame with different sequence per column


I would like to simulate in R a data frame with 4 columns with the above conditions:

  1. Each row is summing up to 1

  2. The first element of column 1 starts with being closer to 1, say 0.9, but gradually decreases per row.

  3. The elements of the other 3 columns start low, say something between 0.02 - 0.05, but gradually increase with each row.

  4. I would like the last row of the data frame being c(0.25, 0.25, 0.25, 0.25).

Can you help me with creating something like that ? Thank you in advance!


Solution

  • # State the number of rows you want to create
    rows <- 1000
    
    # Create 4 randomly generated columns, the first ranging [.25, 1], the others [0, .25] and add .25 as the last row
    x1<-c(runif(rows-1,.25,1), .25)
    x2<-c(runif(rows-1,0,.25), .25)
    x3<-c(runif(rows-1,0,.25), .25)
    x4<-c(runif(rows-1,0,.25), .25)
    
    library(dplyr)
    # Combine the columns into a single table
    df <- data.frame(x1, x2, x3, x4)
    
    df %>% 
      # Sort each column individually (meaning the sorting of 1 column doesn't affect another)
      mutate(x1 = sort(x1, decreasing = T)) %>%
      mutate(across(x2:x4, ~sort(.x))) %>%
      # Transpose the table so you can sort the rows
      t() %>%
      as_tibble() %>%
      mutate_all(.funs = ~sort(., decreasing = T)) %>%
      t() %>%
      as_tibble() %>%
      # Scale the table you have so the sum of each row equals to 1
      mutate_all(.funs = ~ ./rowSums(across(everything())))