Search code examples
swiftrandomdistribution

Function to change random distribution in Swift?


Does Swift have some built in methods to change the distribution of random numbers? I want to use a linear equation to define the distribution, something like y = k * x + m. When k = 0, all numbers should be equally distributed. When k = 1 the distribution should follow the line, so low x-values would be very rare while high x-values would be common. I played around in Excel and tried different strategies and finally came up with this code, which seems to work - but there must be a neater way to do this in Swift?

As a note: First I did use an array of ClosedRange instead of the tuples-approach, then used .contains. Then I changed it to an array of tuples because my code didn't work as expected. Probably another bug, but I stayed with tuples as the code works now.

import Foundation

/* function to create an array of tuples with upper and lower
 limits based on a linear distribution (y = k * x + m) */
func createDistributions(numbers: ClosedRange<Int>, k: Double) -> [(Double, Double)] {
    var dist = [(Double, Double)]()
    let m: Double = 0.5
    let nVal: Int = numbers.count
    var pStop: Double = 0.0

    for x in numbers {
        let t = (Double(x) + 0.5) / Double(nVal)
        let y = (k * (t - 0.5) + m) * 2.0 / Double(nVal)
        let start = pStop
        let stop = y + start
        
        dist.append((start, stop))
        pStop = stop
    }
    
    return dist
}

// create distributions based on k-value of 1.0
var result = createDistributions(numbers: 0...34, k: 1.0)


// loop ten times, creating a Double random number each time
for _ in 0...9 {
    let ran = Double.random(in: 0...1)
    
    // check in which indexed array the random number belongs to by checking lower and upper limit
    for i in 0..<result.count {
        
        // the random number belongs to the i:th element, print i
        if ran >= result[i].0 && ran <= result[i].1 {
            print(i)
        }
    }
}

Solution

  • Your y = kx+m is a probability density function (PDF). A very good way to apply this to random number generation is an inverse transform sampling function. I'll walk through how to develop that, step by step, so that you can adapt it to your specific needs. In the general case, this would be done with first-year calculus, but for the linear case it's easy enough to do with basic algebra and some grade-school geometry. For this example, I'll be generating a random value between 0 and 1.

    (For other Americans reading along: this is the slope-intercept form we learn as y = mx+b. Please don't get confused that m here is the intercept, not the slope. I hopefully haven't mixed them up anywhere in this answer.)

    To experiment with this answer, see the GeoGebra worksheet that the images come from.

    The TL;DR of all of this is:

    let u = Double.random(in: 0...1)
    if k == 0 {
        return u
    } else {
        return (sqrt(k*k + k*(8*u - 4) + 4) + k - 2)/(2*k)
    }
    

    But learning why that's the answer is the real goal.

    A PDF is a function whose area between two x-values is the probability that the value is between those values. This leads to the fact that the PDF must be positive for all values in its range, and the area under it must be exactly 1 (representing a 100% chance of selecting some value over the entire range).

    But a quick look at an arbitrary version of this curve shows that it may not have the right area:

    Graph of y=kx+m with k=0.4 and m=0.4

    For a given value of k there is a specific value of m that is valid. We can work that out by computing the area in terms of k and m, setting it to 1, and then solving for m. The area of the graph is a rectangle with base 1 (the range 0-1 of values we'll select) and a height m, plus a triangle of base 1, and height k. So:

    Area = Rectangle + Triangle = 1
           m + k/2 = 1
           m = 1 - k/2
    

    And substituting back into F(x):

    F(x) = kx + 1 - k/2
    

    We are also constrained that m cannot be less than 0, which constrains k to the range [0,2]. When k is 0, then all values are equally likely. When k is 2, there is a linear relationship between a value and its likelihood.

    Constrained PDF with linear relationship

    With a valid PDF, it is time to create a cumulative distribution function. This is a function that represents the likelihood that a randomly selected value is no greater than the given value. These functions are constrained for the same reasons as PDFs. They must monotonically increase from zero to one over the valid range.

    CDF from 0 to 0.3 = 0.174

    This area can be computed just like the full area, by summing a rectangle and a triangle:

    CDF(x) = Rectangle + Triangle
           = mx + (x/2 * (F(x) - m))
           = ... some algebra later ...
           = (k*x^2)/2 + (1-k/2)*x
    

    CDF function that intersects 0,0 and 1,1

    Notice that this function correctly passes through (0,0) and (1,1) as it must, and is positive over the entire range. There is no chance of a value less than zero, and there is 100% probability of a value less than or equal to one.

    Almost there. An inverse transform sample applies the inverse of the CDF. That's not particularly complicated, but it is a lot of algebra, so let WolframAlpha do it:

    solve y = (k*x^2)/2 + (1-k/2)*x for x
    ==>
    x = y and k = 0
    x = -(sqrt(k^2 + 8 k y - 4 k + 4) - k + 2)/(2 k) and k!=0
    x = (sqrt(k^2 + k (8 y - 4) + 4) + k - 2)/(2 k) and k!=0
    

    For k=0, x=y. Elsewhere, there are two solutions. Only positive values make sense here, so ignore the negative one.

    Inverse of CDF for k=1.5

    The red line is the function you want (this is for k=1.5). Very long road to get here, but now the code is pretty easy:

    // `k` ranges from 0 to 2, which is confusing. Map it to range 0...1
    func randomValue(distribution d: Double) -> Double {
        assert((0...1).contains(d))
        let u = Double.random(in: 0...1)
    
        // k ranges from 0 to 2
        let k = d * 2
    
        if k == 0 {
            return u
        } else {
            return (sqrt(k*k + k*(8*u - 4) + 4) + k - 2)/(2*k)
        }
    }
    

    And just to test it out:

    func testRun(distribution d: Double) {
        print("Distribution for \(d)")
        let n = 10_000
    
        // How many results begin with a given digit after the decimal point?
        var h: [Substring:Int] = [:]
        for _ in 0..<n {
            let value = randomValue(distribution: d)
            let firstDigit = "\(value)".prefix(3).suffix(1)
            h[firstDigit, default: 0] += 1
        }
    
        for digit in h.keys.sorted() {
            let ratio = Double(h[digit]!)/Double(n)
            print("\(digit) -> \(ratio.formatted(.percent.precision(.fractionLength(0))))")
        }
    }
    
    testRun(distribution: 0)
    testRun(distribution: 0.5)
    testRun(distribution: 1)
    
    ===>
    Distribution for 0.0
    0 -> 10%
    1 -> 10%
    2 -> 10%
    3 -> 11%
    4 -> 10%
    5 -> 10%
    6 -> 10%
    7 -> 10%
    8 -> 10%
    9 -> 10%
    Distribution for 0.5
    0 -> 6%
    1 -> 6%
    2 -> 7%
    3 -> 9%
    4 -> 10%
    5 -> 11%
    6 -> 11%
    7 -> 13%
    8 -> 13%
    9 -> 14%
    Distribution for 1.0
    0 -> 1%
    1 -> 3%
    2 -> 5%
    3 -> 7%
    4 -> 9%
    5 -> 11%
    6 -> 13%
    7 -> 15%
    8 -> 17%
    9 -> 19%
    

    A linear equation can only push this so far. I don't believe you can get a larger difference between low-probability and high-probability values with only a linear PDF (a better mathematician may correct me here; this isn't my speciality). If you want that, I would explore applying this to larger-order polynomials. Rather than F(x) = kx + m, you could do the same thing with F(x) = kx^2 + m or even higher powers. This will take some first-year calculus, but the overall approach should be similar.