Depending on sample of values of random variable I create cumulative density function using kernel density estimation.
cdf = gaussian_kde(sample)
What I need is to generate sample values of random variable whose density function is equal to constructed cdf. I know about the way of inversing the probability distribution function, but since I can not do it analitically it requires pretty complicated preparations. Is there integrated solution or maybe another way to accomplish the task?
If you're using a kernel density estimator (KDE) with Gaussian kernels, your density estimate is a Gaussian mixture model. This means that the density function is a weighted sum of 'mixture components', where each mixture component is a Gaussian distribution. In a typical KDE, there's a mixture component centered over each data point, and each component is a copy of the kernel. This distribution is easy to sample from without using the inverse CDF method. The procedure looks like this:
Setup
mu
be a vector where mu[i]
is the mean of mixture component i
. In a KDE, this will just be the locations of the original data pointssigma
be a vector where sigma[i]
is the standard deviation of mixture component i
. In typical KDEs, this will be the kernel bandwidth, which is shared for all points (but variable-bandwidth variants do exist).w
be a vector where w[i]
contains the weight of mixture component i
. The weights must be positive and sum to 1. In a typical, unweighted KDE, all weights will be 1/(number of data points)
(but weighted variants do exist).Choose the number of random points to sample, n_total
Determine how many points will be drawn from each mixture component.
n
be a vector where n[i]
contains the number of points to sample from mixture component i
.n
from a multinomial distribution with "number of trials" equal to n_total
and "success probabilities" equal to w
. This means the number of points to draw from each mixture component will be randomly chosen, proportional to the component weights.Draw random values
i
:n[i]
values from a normal distribution with mean mu[i]
and standard deviation sigma[i]
Shuffle the list of random values, so they have random order.
This procedure is relatively straightforward because random number generators (RNGs) for multinomial and normal distributions are widely available. If your kernels aren't Gaussian but some other probability distribution, you can replicate this strategy, replacing the normal RNG in step 4 with a RNG for that distribution (if it's available). You can also use this procedure to sample from mixture models in general, not just KDEs.