TL;DR I need to efficiently binary search a byte slice for a sequence of bytes.
Releated: byte[] array pattern search
I have a binary file of 16-byte ip addresses which i'd like to do a binary search on. I am embedding this file into the go Binary using Packr which will give a []byte
of the file's data.
This meant I had to loop over the []byte to create a [][]byte to search on 16 bytes rather than 1 bytes. This loop is not efficient and i'm looking for a way to avoid having it.
I've produced a minimal example below without the use of Packr.
package main
import (
"fmt"
"io/ioutil"
)
func main() {
// Get our file. It is a file with many 16-byte form IP.
// head -c 32 bin/test | xxd
// 00000000: 0100 0000 0000 0000 0000 0000 0000 0000 ................
// 00000010: 0000 0000 0000 0000 1800 0000 0000 0000 ................
buf, err := ioutil.ReadFile("bin/test")
if err != nil {
fmt.Println(err)
}
// This is too slow :-(
// How could this loop be replaced by some additional logic in the binary search below
data := [][]byte{}
for i := 0; i < len(buf); i += 16 {
data = append(data, buf[i:i+16])
}
i := sort.Search(len(data), func(i int) bool {
fmt.Println(len(data[i]), len(n.Bytes()))
return bytes.Compare(data[i], n.Bytes()) < 1
})
if i < len(data) {
fmt.Println(data[i])
} else {
fmt.Println("Not found")
}
}
Use the following code to binary search for a 16 byte IP address in a slice of bytes containing sorted IP addresses.
func search16(haystack, needle []byte) int {
return 16 * sort.Search(len(haystack)/16,
func(i int) bool { return bytes.Compare(haystack[i*16:(i+1)*16], needle) >= 0 })
}
The indexes seen by sort.Search are the byte offsets divided by 16. The result from sort.Search is multiplied by 16 to get a byte offset.