Search code examples
rustiterator

Efficiently returning 1-9 bytes from a function


There are at least three ways to return bytes from a function.

  • As an iterator producing u8s.
  • Taking a Writer as an argument.
  • Taking an &mut [u8] argument.

My function will be producing 1-9 bytes on each call.

I want it to be easily useable, and efficient, to:

  • Feed an iterator pipeline which consumes bytes.
  • Write to disk.
  • Write to network.
  • Write to memory.

Can I just implement the function once, and trust the the compiler will make a single implementation efficient for all use cases, when the user adapts the output?

Or do I need to implement it three (or more) times to cover all the use-cases efficiently?


Solution

  • Let's evaluate the options.

    If we write a function that takes &mut [u8], and we want to use it:

    • To write to disk/network (both are std::io::Write) - we can create an array of type [u8; 9] (since it writes up to 9 bytes), pass it to it, and then pass it (more precisely, the part of it that was actually written to) to Write::write_all().
    • Writing to the memory is easiest: just pass a slice to an array of size [u8; 9].
    • If we want to feed it into an iterator pipeline, we will need to have a little boilerplate:
    let bytes = [0; 9];
    let written = foo(&mut bytes);
    bytes.into_iter().take(written).some_iterator_chain()
    // Or
    bytes[..written].iter().copied().some_iterator_chain()
    

    But the compiler will probably optimize both ways to a very efficient assembly.

    If we'll write an iterator that takes std::io::Write, it'll be very easy to write to the disk/network, and we will even be able to write to the memory using the implementation of Write for &mut [u8], but piping it through iterators will require the same boilerplate as above. Overall, it pretty much doesn't matter whether you choose to write to &mut [u8] or to a type implementing Write.

    However, if we return an iterator, we can easily put it in an iterator chain, write to memory is also easy using a for loop, but to write to network/disk, you will have either to call write() (or write_all()) mutliple times, which is usually less efficient than once, or collect it into a Vec, which is also less efficient. So this is probably the worse option.