Search code examples
gorandomslicesampling

Sample without replacement in golang


What's the best way to sample without replacement from a slice in golang?

a := make([]int, 100)
for i := range a {
    a[i] = i
}

# TODO sample 5 elements from a without replacement.

Solution

  • If the set size is relatively small overall, or you are sampling a large portion of the set, the simplest method is to shuffle the elements and pick the first n:

    rand.Shuffle(len(a), func(i, j int) { a[i], a[j] = a[j], a[i] })
    fmt.Println(a[:5])
    

    https://play.golang.org/p/lQx44Mn9RQL

    If you don't want to shuffle the entire set, but it's acceptable to alter the order of the set (or copy the entire set), you can "record" the used values more efficiently by removing them from the slice.

    // create a copy of the slice header
    c := a
    samples := make([]int, n)
    
    for i := 0; i < n; i++ {
        r := int(rand.Int63n(int64(len(c))))
        samples[i] = c[r]
    
        // remove the sample from the copy slice
        c[r], c[len(c)-1] = c[len(c)-1], c[r]
        c = c[:len(c)-1]
    }
    

    In the case that the set size is quite large and you are sampling only a small portion, you can sample from the original set without modification by recording the sample index and not repeating it. Obviously as the ratio of the sample size to the set size grows, the number of collisions will grow making this less efficient.

    For example:

    // record indexes here to prevent duplicates
    indexes := make(map[int]bool)
    
    // create n random indexes
    for i := 0; i < n; i++ {
        var r int
        for {
            r = int(rand.Int63n(int64(len(a))))
            if indexes[r] {
                continue
            }
            break
        }
    
        indexes[r] = true
    }
    
    samples := make([]int, 0, n)
    for i := range indexes {
        samples = append(samples, a[i])
    }