Search code examples
statisticsdownsamplingsubsamplingstatistical-sampling

Sample to Create Uniform Distribution from Non-Uniform Data


Given a dataset with a non-uniform distribution (highly peaked) I want to resample to create a new dataset with an approximately uniform distribution. My approach:

  1. Divide the data into bins.
  2. Target bin level = Smallest number of samples per bin, among all bins.
  3. Randomly delete samples until each bin count = target bin level.

Is there a better technique?


Solution

  • We know that for a uniform distribution we have

    mean = (a+b) / 2

    variance = (b-a)^2 / 12

    So you could just construct these and sample from a uniform distribution with these parameters, where you either set a = min(data) and b = max(data) or maybe a = mean(lowest_bin) and b = mean(highest_bin) or something like that. How you want to set a and b depends on your data and what you want to accomplish