I have some large json files I want to parse, and I want to avoid loading all of the data into memory at once. I'd like a function/loop that can return me each character one at a time.
I found this example for iterating over words in a string, and the ScanRunes function in the bufio package looks like it could return a character at a time. I also had the ReadRune
function from bufio mostly working, but that felt like a pretty heavy approach.
I compared 3 approaches. All used a loop to pull content from either a bufio.Reader or a bufio.Scanner.
.ReadRune
on a bufio.Reader
. Checked for errors from the call to .ReadRune
.bufio.Scanner
after calling .Split(bufio.ScanRunes)
on the scanner. Called .Scan
and .Bytes
on each iteration, checking .Scan
call for errors.bufio.Scanner
instead of bytes using .Text
. Instead of joining a slice of runes with string([]runes)
, I joined an slice of strings with strings.Join([]strings, "")
to form the final blobs of text.The timing for 10 runs of each on a 23 MB json file was:
0.65 s
2.40 s
0.97 s
So it looks like ReadRune
is not too bad after all. It also results in smaller less verbose call because each rune is fetched in 1 operation (.ReadRune
) instead of 2 (.Scan
and .Bytes
).
Just read each rune one by one in the loop... See example
package main
import (
"bufio"
"fmt"
"io"
"log"
"strings"
)
var text = `
The quick brown fox jumps over the lazy dog #1.
Быстрая коричневая лиса перепрыгнула через ленивую собаку.
`
func main() {
r := bufio.NewReader(strings.NewReader(text))
for {
if c, sz, err := r.ReadRune(); err != nil {
if err == io.EOF {
break
} else {
log.Fatal(err)
}
} else {
fmt.Printf("%q [%d]\n", string(c), sz)
}
}
}