Search code examples
performancerustio

Why is the efficient example of BufRead in the Rust by Example book, efficient?


The Rust by Example book gives two (relevant here) examples of how to use BufRead. They first give a "beginner friendly" example, before going onto a more "Efficient method".

The beginner friendly example reads a file line by line:

use std::fs::File;
use std::io::{ self, BufRead, BufReader };

fn read_lines(filename: String) -> io::Lines<BufReader<File>> {
    // Open the file in read-only mode.
    let file = File::open(filename).unwrap(); 
    // Read the file line by line, and return an iterator of the lines of the file.
    return io::BufReader::new(file).lines(); 
}

fn main() {
    // Stores the iterator of lines of the file in lines variable.
    let lines = read_lines("./hosts".to_string());
    // Iterate over the lines of the file, and in this case print them.
    for line in lines {
        println!("{}", line.unwrap());
    }
}

The "efficient method" does nearly the same:

use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;

fn main() {
    // File hosts must exist in current path before this produces output
    if let Ok(lines) = read_lines("./hosts") {
        // Consumes the iterator, returns an (Optional) String
        for line in lines {
            if let Ok(ip) = line {
                println!("{}", ip);
            }
        }
    }
}

// The output is wrapped in a Result to allow matching on errors
// Returns an Iterator to the Reader of the lines of the file.
fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, {
    let file = File::open(filename)?;
    Ok(io::BufReader::new(file).lines())
}

The rust book states for the latter:

This process is more efficient than creating a String in memory especially working with larger files.

While the latter is slightly cleaner, using if let instead of unwrap, why is it more efficient to return a Result? I assume that once we unwrap the iterator in the second example (in if let Ok(lines) = read_lines("./hosts")), that performance wise it should be identical to the first example. Why does it differ then? Why does the iterator in the second example return a result each time?


Solution

  • You're right the "beginner friendly" method is no less efficient, and does not "create a String in memory". It seems like many of us were confused.

    There are currently at least two pull requests which try to fix the confusion, maybe you can comment on the pull request you prefer:

    Both of these pull requests modify the beginner friendly method to use read_to_string instead of BufRead.

    read_to_string makes the beginner friendly method "NOT efficient" as the text from #1641 suggests.

    read_to_string also gives a real example of "create a String in memory". It's funny the phrase "create a String in memory" was there since the very first commit....

    ...At first the phrase only described a hypothetical approach that would be less efficient ...

    ... then #1641 gave some actual code in a beginner-friendly method ... but it was no less efficient!...

    ...until #1679 or #1681 there was never actual code demonstrating the less efficient approach!

    UPDATE #1679 was merged into master