Search code examples
gobinarybinary-search

Efficiently binary search a []byte rather than [][]byte


TL;DR I need to efficiently binary search a byte slice for a sequence of bytes.

Releated: byte[] array pattern search

I have a binary file of 16-byte ip addresses which i'd like to do a binary search on. I am embedding this file into the go Binary using Packr which will give a []byte of the file's data.

This meant I had to loop over the []byte to create a [][]byte to search on 16 bytes rather than 1 bytes. This loop is not efficient and i'm looking for a way to avoid having it.

I've produced a minimal example below without the use of Packr.

package main

import (
    "fmt"
    "io/ioutil"
)

func main() {
    // Get our file. It is a file with many 16-byte form IP.
    // head -c 32 bin/test | xxd
    // 00000000: 0100 0000 0000 0000 0000 0000 0000 0000  ................
    // 00000010: 0000 0000 0000 0000 1800 0000 0000 0000  ................

    buf, err := ioutil.ReadFile("bin/test")

    if err != nil {
        fmt.Println(err)
    }

    // This is too slow :-(
    // How could this loop be replaced by some additional logic in the binary search below
    data := [][]byte{}
    for i := 0; i < len(buf); i += 16 {
        data = append(data, buf[i:i+16])
    }

    i := sort.Search(len(data), func(i int) bool {
        fmt.Println(len(data[i]), len(n.Bytes()))
        return bytes.Compare(data[i], n.Bytes()) < 1
    })

    if i < len(data) {
        fmt.Println(data[i])
    } else {
        fmt.Println("Not found")
    }
}



Solution

  • Use the following code to binary search for a 16 byte IP address in a slice of bytes containing sorted IP addresses.

    func search16(haystack, needle []byte) int {
        return 16 * sort.Search(len(haystack)/16, 
           func(i int) bool { return bytes.Compare(haystack[i*16:(i+1)*16], needle) >= 0 })
    }
    

    The indexes seen by sort.Search are the byte offsets divided by 16. The result from sort.Search is multiplied by 16 to get a byte offset.