Search code examples
pythonseabornprobabilitydistribution

Easier way to transform array into a distribution?


I have an array, vox_betas, that contains 21600 floats (ranging from ~0 to ~2), and when sorted by the array, features, you can see that there is a structure to the data (see 1st pic).

I want to have a single array that reflects this structure -- essentially I want to call sns.distplot() and have that produce the same plot as the first picture. Right now sns.distplot(vox_betas) depicts the 2nd picture, which is not what I want.

I was able to accomplish this in the third picture by creating the array, dist, but the way I accomplished this was sloppy and even loses some information (my code is below).

How would you transform vox_betas and features into dist? Does anyone have any ideas?

plt.scatter(features,vox_betas)

enter image description here

sns.distplot(vox_betas)

enter image description here

dist=[]
for f in np.unique(features):
    dist = np.concatenate((dist,
                np.repeat(f,
                np.sum(
                [vox_betas[j]*10 for j in np.where(features==f)[0]]))))

sns.distplot(dist)

enter image description here


Solution

  • This is called inverse transform sampling:

    is a basic method for pseudo-random number sampling, i.e., for generating sample numbers at random from any probability distribution given its cumulative distribution function.

    The best explanation I found is this one. Also discussed here.