Efficiently returning 1-9 bytes from a function

There are at least three ways to return bytes from a function.

As an iterator producing u8s.
Taking a Writer as an argument.
Taking an &mut [u8] argument.

My function will be producing 1-9 bytes on each call.

I want it to be easily useable, and efficient, to:

Feed an iterator pipeline which consumes bytes.
Write to disk.
Write to network.
Write to memory.

Can I just implement the function once, and trust the the compiler will make a single implementation efficient for all use cases, when the user adapts the output?

Or do I need to implement it three (or more) times to cover all the use-cases efficiently?

Solution

Let's evaluate the options.

If we write a function that takes &mut [u8], and we want to use it:

To write to disk/network (both are std::io::Write) - we can create an array of type [u8; 9] (since it writes up to 9 bytes), pass it to it, and then pass it (more precisely, the part of it that was actually written to) to Write::write_all().
Writing to the memory is easiest: just pass a slice to an array of size [u8; 9].
If we want to feed it into an iterator pipeline, we will need to have a little boilerplate:

let bytes = [0; 9];
let written = foo(&mut bytes);
bytes.into_iter().take(written).some_iterator_chain()
// Or
bytes[..written].iter().copied().some_iterator_chain()

But the compiler will probably optimize both ways to a very efficient assembly.

If we'll write an iterator that takes std::io::Write, it'll be very easy to write to the disk/network, and we will even be able to write to the memory using the implementation of Write for &mut [u8], but piping it through iterators will require the same boilerplate as above. Overall, it pretty much doesn't matter whether you choose to write to &mut [u8] or to a type implementing Write.

However, if we return an iterator, we can easily put it in an iterator chain, write to memory is also easy using a for loop, but to write to network/disk, you will have either to call write() (or write_all()) mutliple times, which is usually less efficient than once, or collect it into a Vec, which is also less efficient. So this is probably the worse option.