Search code examples
gosamplinggonum

Weighted sampling without replacement using gonum


I have a big array of items and another array of weights of the same size. I would like to sample without replacement from the first array based on the weights from the second array. Is there a way to do this using gonum?


Solution

  • Weighted and its relative method .Take() look exactly like what you want.

    From the doc:

    func NewWeighted(w []float64, src *rand.Rand) Weighted
    

    NewWeighted returns a Weighted for the weights w. If src is nil, rand.Rand is used as the random source. Note that sampling from weights with a high variance or overall low absolute value sum may result in problems with numerical stability.

    func (s Weighted) Take() (idx int, ok bool)
    

    Take returns an index from the Weighted with probability proportional to the weight of the item. The weight of the item is then set to zero. Take returns false if there are no items remaining.

    Therefore Take is indeed what you need for sampling without replacement.

    You can use NewWeighted to create a Weighted with the given weights, then use Take to extract one index with probability based on the previously set weights, and then select the item at the extracted index from your array of samples.


    Working example:

    package main
    
    import (
        "fmt"
        "time"
    
        "golang.org/x/exp/rand"
    
        "gonum.org/v1/gonum/stat/sampleuv"
    )
    
    func main() {
        samples := []string{"hello", "world", "what's", "going", "on?"}
        weights := []float64{1.0, 0.55, 1.23, 1, 0.002}
    
        w := sampleuv.NewWeighted(
            weights,
            rand.New(rand.NewSource(uint64(time.Now().UnixNano())))
        )
    
        i, _ := w.Take()
    
        fmt.Println(samples[i])
    }