Search code examples
rdistributionsimulateprobabilistic-programming

Simulate fat tail data in R


I need to simulate data in R with a fat tail distribution, and having never simulated data before I'm not sure where to start. I have looked into the FatTailsR package but the documentation is pretty cryptic and I can't seem to find any obvious tutorials.

Basically, I want to create an artificial dataframe with two columns (X and Y), of 10,000 observations, that uses the following logic/iterations:

  • For each observation of X there is an 75% probability that Y is 0, and 25% probability Y is 1 (assigning every observation a 0 or 1).
  • Next, look only at the observations of X where Y is 1. Of these observations (25% of the original dataset) there is a 25% that Y is 2.
  • Of the observations where Y is 2, 25% get bumped up to 3.
  • And iterate so on up to Y = 10.

Any guidance would be appreciated. Including suggestions of packages and functions to check out (maybe something like rlnorm ?)


Solution

  • This might work (not super-efficient, but ...)

    First figure out the probabilities of each outcome (P(1)=0.75, P(2)=0.75*0.25, P(3)=0.75*0.25^2 ...)

    cc <- cumprod(c(0.75,rep(0.25,9)))
    

    Choose a multinomial deviate with these probabilities (N=1 for each sample)

    rr <- t(rmultinom(1000,size=1,prob=cc))
    

    Figure out which value in each row is equal to 1:

    storage.mode(rr) <- "logical"
    out <- apply(rr,1,which)
    

    Check results:

    tt <- table(factor(out,levels=1:10))
      1   2   3   4   5   6   7   8   9  10 
    756 183  43  14   3   1   0   0   0   0 
    

    There might be a cleverer way to set this up in terms of a modified geometric distribution ...