Search code examples
c++linuxfilefile-iostream

How to copy a prefix of an input stream to a different stream in C++?


There's a neat trick that can be used to copy file contents in C++. If we have an std::ifstream input; for one file and an std::ofstream output; for a second file, the contents of input can be copied to output like this:

output << input.rdbuf();

This solution copies the entirety of the first file (or at least the entirety of the input stream that hasn't been consumed yet). My question is, how can I copy only a prefix (n first bytes) of an input stream to an output stream?

After looking through a bit of documentation, my idea was to somehow shorten the stream of the input and then copy it to the output stream:

input.rdbuf()->pubsetbuf(input.rdbuf()->eback(), length_to_output);
output << input.rdbuf();

The problem is, that this won't compile as eback is not public. This is only supposed to illustrate my idea. I know that I could just copy the entire input into a string and then copy a substring of it into the output, but I am worried that it will be less time- and memory-efficient. As such, I thought about using streams instead, like above.


Solution

  • I tried different solutions, including the one presented by @Some programmer dude, and ultimately decided to go with a manual read and write loop. Below is the code that I used (based on this, with small modifications) and at the bottom are the benchmark results:

    bool stream_copy_n(std::istream& in, std::ostream& out, std::size_t count) noexcept
    {
        const std::size_t buffer_size = 256 * 1024; // a bit larger buffer
        std::unique_ptr<char[]> buffer = std::make_unique<char[]>(buffer_size); // allocated on heap to avoid stack overflow
        while(count > buffer_size)
        {
            in.read(buffer.get(), buffer_size);
            out.write(buffer.get(), buffer_size);
            count -= buffer_size;
        }
    
        in.read(buffer.get(), count);
        out.write(buffer.get(), count);
    
        return in.good() && out.good(); // returns if copy was successful 
    }
    

    The benchmark results (when copying an entire file 1GB file) acquired using the built in Unix time command, real time:

    Method Time
    Linux C function sendfile 0.59
    std::filesystem::copy_file 0.60
    Unix command cp 0.69
    Manual read and write loop presented above 0.78
    output << input.rdbuf() 0.96
    std::copy_n(std::istreambuf_iterator<char>(input), std::filesystem::file_size(inputFilePath), std::ostreambuf_iterator<char>(output)); 3.28
    std::copy_n(std::istream_iterator<char>(input), std::filesystem::file_size(inputFilePath), std::ostream_iterator<char>(output)); 27.37

    Despite the fact that it is not the fastest, I chose the read-write loop as it uses stream objects and isn't exclusive to only copying files.