Search code examples
rrandomresamplingstatistics-bootstrap

Drawing a sample that changes the shape of the mother sample


Background:

I'm trying to modify the shape of a histogram resulted from an "Initial" large sample obtained using Initial = rbeta(1e5, 2, 3). Specifically, I want the modified version of the Initial large sample to have 2 additional smaller (in height) "humps" (i.e., another 2 smaller-height peaks in addition to the one that existed in the Initial large sample).

Coding Question:

I'm wondering how to manipulate sample() (maybe using its prob argument) in R base so that this command samples in a manner that the two additional humps be around ".5" and ".6" on the X-Axis?

Here is my current R code:

Initial = rbeta(1e5, 2, 3) ## My initial Large Sample

hist (Initial)             ## As seen, here there is only one "hump" say near 
                            # less than ".4" on the X-Axis


Modified.Initial = sample(Initial, 1e4 ) ## This is meant to be the modified version of the
                                          # the Initial with two additional "humps"

hist(Modified.Initial)          ## Here, I need to see two additional "humps" near  
                                 # ".5" and ".6" on the X-Axis

Solution

  • You can adjust the density distribution by combining it with beta distributions with the desired modes for a smoothed adjustment.

    set.seed(47)
    
    Initial = rbeta(1e5, 2, 3)
    d <- density(Initial)
    
    # Generate densities of beta distribution. Parameters determine center (0.5) and spread.
    b.5 <- dbeta(seq(0, 1, length.out = length(d$y)), 50, 50)
    b.5 <- b.5 / (max(b.5) / max(d$y))    # Scale down to max of original density
    
    # Repeat centered at 0.6
    b.6 <- dbeta(seq(0, 1, length.out = length(d$y)), 60, 40)
    b.6 <- b.6 / (max(b.6) / max(d$y))
    
    # Collect maximum densities at each x to use as sample probability weights
    p <- pmax(d$y, b.5, b.6)
    
    plot(p, type = 'l')
    

    # Sample from density breakpoints with new probability weights
    Final <- sample(d$x, 1e4, replace = TRUE, prob = p)
    

    Effects on histogram are subtle...

    hist(Final)
    

    ...but are more obvious in the density plot.

    plot(density(Final))
    

    Obviously all adjustments are arbitrary. Please don't do terrible things with your power.