Search code examples
rustiteratormmap

Why are iterators in Rust seemingly very slow when I iterate over them?


I have a large file that I am reading from using mmap. I want to do some operations on each line, so I call split() on it, which gives me an iterator for each line:

let file = File::open("myfile").unwrap();
let mmap = unsafe { MmapOptions::new().map(&file).unwrap() };
//splitting by newline
let iter = mmap.split(|elem| elem == &b'\n');

This works fine and doesn't give me any issue - runs very fast.

However, when I go through the iterator, it jumps and the time it takes to go through the for loop is around 4 times the time it took to read and split.

Also, this is without processing the line or doing anything inside the for loop:

for elem in iter {
  //process the line
}

Since performance is an issue - I find it odd that it manages to read & split the file very fast, however, it's becoming really slow when I go through the iterator. Am I missing something? My knowledge of rust is also limited so not sure if I am doing something bad. Is there something that can help me optimize this and get me faster access times?

Also, parallel iterators are not that helpful in my case - the overhead that they add is not worth.

Whole file:

use memmap::MmapOptions;
use std::fs::File;
use std::time::{Duration, Instant};

fn main() {

    let now = Instant::now();
    let file = File::open("myfile").unwrap();
    let mmap = unsafe { MmapOptions::new().map(&file).unwrap() };
    let iter = mmap.split(|elem| elem == &b'\n');

    /*
    for elem in iter {
      //do nothing
    }
    */
    println!("{:?}", now.elapsed());
}

If I uncomment the for loop, it becomes 4 times slower. I am building with --release tag so that is not an issue.


Solution

  • The code only looks slow when un-commenting the for loop because it does not do anything otherwise. Iterators are lazy, and only perform some activity when consumed.

    Quoting relevant parts from the Rust Programming language, chapter 13, section 2:

    In Rust, iterators are lazy, meaning they have no effect until you call methods that consume the iterator to use it up. [...] calling the next method on an iterator changes internal state that the iterator uses to keep track of where it is in the sequence. In other words, this code consumes, or uses up, the iterator

    A for loop is an example of a construct which consumes the iterator. Calling .split() on the memory-mapped data only creates an adaptor for that iterator (note that it does not mean that it creates multiple iterators). Adaptors are a common way of working with iterators, as described in the book as well.

    Other methods defined on the Iterator trait, known as iterator adaptors, allow you to change iterators into different kinds of iterators. You can chain multiple calls to iterator adaptors to perform complex actions in a readable way. But because all iterators are lazy, you have to call one of the consuming adaptor methods to get results from calls to iterator adaptors.

    As such, the example does not eagerly create these splits in memory, and the program is only doing something worthwhile when the for loop is present, or the iterator is consumed in some other way.

    See also: