Search code examples
boostboost-asiotcpclienttcpserverasio

Problem sending 2 files to the client using Boost.Asio, Error: read: End of file [asio.misc:2]


I want to send two files to the client, the first file is img.jpg and the second file is message.txt

The first file img.jpg is received correctly, but the file message.txt is received with size zero

client output is: img.jpg: 49152 message.txt: 4096 read: End of file [asio.misc:2]

here are my client and server codes:

server.cpp:

#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <boost/asio.hpp>

int main(int argc, char* argv[])
{
    try
    {
        boost::asio::io_context io;
        std::cout << "Server Start\n";
        boost::asio::ip::tcp::acceptor acc(io,
            boost::asio::ip::tcp::endpoint(
                boost::asio::ip::tcp::v4(), 6666));

        for (;;) {
            boost::asio::ip::tcp::socket sock(io);
            acc.accept(sock);

            std::vector<std::string> names{ "img.jpg" , "message.txt"};
            std::vector<int> sizes{ 49152 , 4096 };

            for (int i = 0; i < 2; ++i) {
            
                //Send Header
                boost::asio::streambuf reply;
                std::ostream header(&reply);
                header << names[i] << " ";
                header << std::to_string(sizes[i]) << " ";
                header << "\r\n";
                boost::asio::write(sock, reply);

                //Send Bytes
                std::ifstream input(names[i], std::ifstream::binary);
                std::vector<char> vec(sizes[i]);
                input.read(&vec[i], sizes[i]);
                boost::asio::write(sock, boost::asio::buffer(vec, sizes[i]));
            }

            sock.close();
        }
        acc.close();
    }
    catch (std::exception& e)
    {
        std::cerr << e.what() << std::endl;
    }

    return 0;
}

client.cpp:

#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <boost/asio.hpp>

int main(int argc, char* argv[])
{
    try
    {

        boost::asio::io_context io;
        boost::asio::ip::tcp::resolver resolv(io);
        boost::asio::ip::tcp::resolver::query q("127.0.0.1", "6666");
        boost::asio::ip::tcp::resolver::iterator ep = resolv.resolve(q);
        boost::asio::ip::tcp::socket sock(io);
        boost::asio::connect(sock, ep);

        //Get Files
        for (int i = 0; i < 2; ++i) {

            //Read Header
            boost::asio::streambuf reply;
            boost::asio::read_until(sock, reply, "\r\n");
            std::istream header(&reply);
        
            std::string fileName;
            int fileSize;
            header >> fileName;
            header >> fileSize;
            std::cout << fileName << ": " << fileSize << '\n';
            //Read File Data
            std::ofstream output(fileName, std::ofstream::binary | std::ofstream::app);

            std::vector<char> vec(fileSize);
            boost::asio::read(sock, boost::asio::buffer(vec, vec.size()));

            output.write(&vec[0], vec.size());

            output.close();
        }

        sock.close();

    }
    catch (const std::exception& e) {
        std::cerr << e.what() << std::endl;
    }

    return 0;
}

Solution

  • There are a number of issues.

    1. &vec[i] is a bug in the server side here:

      std::vector<char> vec(sizes[i]);
      input.read(&vec[i], sizes[i]);
      

      If i>0 (which it always is except on the first run) you will address vec out-of-bounds because size is not sizes[i]+i

    2. your header format is sloppy: filenames with spaces will cause UB

    3. os << to_string(n) should just be os << n

    4. server uses a delimiter "\r\n" that the client... completely ignores. All your files begin at least with \r\n that wasn't consumed, and what's worse, the last two characters of the file will then be read as part of the next file's header. This, at best, will fail, but can lead to UB

    5. In fact, it will always lead to UB because there's a complete lack of error handling on the client side

    6. I notice now that you ALMOST skirt the issue by using a separate buffer for the header (streambuf header;) and the contents (directly into vec). However, read_until documents that it may read past the delimiter¹. So, you should have written any remaining data from streambuf and subtract the length from the amount to still read.

      In short, recommend to use separate, exact size buffers OR one DynamicBuffer (like streambuf) per stream.

    7. The same issue is with the client using a new buffer each iteration through the loop:

      for (int i = 0; i < 2; ++i) {
          // Read Header
          asio::streambuf reply;
      

      It should at least be outside the loop so any excess data received will correctly be used on the next iteration

    8. You usually want to deal with partial success of reads (i.e. accept data received together with EOF condition). Here it should not affect correctness because you are precisely limiting the body read to the expected size, but it is still a good habit

    9. You specify redundant buffer size in asio::buffer(vec, vec.size()), this merely invites bugs. Leave them away to get the same behaviour without the risk of getting the wrong size: asio::buffer(vec) (e.g. it would avoid the UB mentioned earlier)

    Demonstrating The Buffer Issues

    A combined server/client with halfway fixes: https://coliru.stacked-crooked.com/a/03bee101ff6e8a7a

    The client side adds a lot error handling

    asio::streambuf buf;
    for (;;) {
        asio::read_until(sock, buf, "\r\n");
    
        std::string name;
        int         size;
        if (std::istream header(&buf); (header >> std::quoted(name) >> size).ignore(1024, '\n')) {
            std::cout << name << ": " << size << '\n';
    
            std::vector<char> vec(size);
            boost::system::error_code ec;
            auto n = read(sock, asio::buffer(vec), ec);
    
            if (n != vec.size()) {
                std::cerr << "Read completed: " << ec.message() << std::endl;
                std::cerr << "Incomplete data (" << n << " of " << vec.size() << ")" << std::endl;
                std::cerr << "Streambuf still had " << buf.size() << " bytes (total: " << (n + buf.size()) << ")" << std::endl;
                break;
            }
            std::ofstream(name, std::ios::binary /*| std::ios::app*/).write(&vec[0], n);
        } else {
            std::cerr << "Error receiving header, header invalid?" << std::endl;
            break;
        }
    }
    

    This allows us to demonstrate the problem with the streambuf reading beyond the delimiter:

    main.cpp: 2712
    main.cpp: 2712
    Read completed: End of file
    Incomplete data (2217 of 2712)
    Streambuf still had 495 bytes (total: 2712)
    

    Or my local test:

    test.gif: 400557
    message.txt: 4096
    Incomplete data (3604 of 4096)
    Streambuf still had 492 (total: 4096)
    

    The clunky/naive way to fix might seem to be something like:

    if (std::istream header(&buf); (header >> std::quoted(name) >> size).ignore(1024, '\n')) {
        std::cout << name << ": " << size << '\n';
    
        std::cerr << "Streambuf still had " << buf.size() << " bytes" << std::endl;
        size -= buf.size();
        std::ofstream ofs(name, std::ios::binary /*| std::ios::app*/);
        ofs << &buf;
    
        std::cerr << "Adjusted size to read: " << size << std::endl;
    
        std::vector<char> vec(size);
        boost::system::error_code ec;
        auto n = read(sock, asio::buffer(vec), ec);
    
        if (n != vec.size()) {
            std::cerr << "Read completed: " << ec.message() << std::endl;
            std::cerr << "Incomplete data (" << n << " of " << vec.size() << ")" << std::endl;
            break;
        }
        ofs.write(&vec[0], n);
    } else {
        std::cerr << "Error receiving header, header invalid?" << std::endl;
        break;
    }
    

    And while it might appear to work correctly:

    enter image description here

    It just invites new problems with small files, where the entire following files are "accidentally" used as the contents for the current file. Instead, just SayWhatYouMean(TM):

    if ((std::istream(&buf) >> name >> size).ignore(1024, '\n')) {
        std::cout << name << ": " << size << '\n';
    
        read(sock, buf, asio::transfer_exactly(size), ec);
    
        if (buf.size() < size) {
            std::cerr << "Incomplete data" << std::endl;
            break;
        }
        std::ofstream(output_dir / name, std::ios::binary /*| std::ios::app*/)
            .write(buffer_cast<char const*>(buf.data()), size);
    
        buf.consume(size);
    } else {
    

    Full Fixes

    Also getting a file list from the command line instead of hardcoding files/sizes, and writing to an output directory for safety.

    Note that it now uses std::filesystem::path wich already uses std::quoted under the hood to protect against problems with filenames with spaces: https://en.cppreference.com/w/cpp/filesystem/path/operator_ltltgtgt

    Live On Coliru

    #include <boost/asio.hpp>
    #include <filesystem>
    #include <fstream>
    #include <iostream>
    #include <ranges>
    namespace asio = boost::asio;
    using asio::ip::tcp;
    using std::ranges::contains;
    using std::filesystem::path;
    constexpr uint16_t PORT = 6666;
    
    int main(int argc, char* argv[]) try {
        asio::io_context io;
    
        std::vector<std::string_view> const
            args(argv + 1, argv + argc), 
            opts{"--client", "-c", "--server", "-s"};
    
        bool const server = contains(args, "--server") || contains(args, "-s");
        bool const client = contains(args, "--client") || contains(args, "-c");
    
        if (server) {
            std::cout << "Server Start" << std::endl;
    
            for (tcp::acceptor acc(io, {{}, PORT});;) {
                tcp::socket sock = acc.accept();
    
                for (path name : args) {
                    if (contains(opts, name))
                        continue;
                    auto size = file_size(name);
    
                    // Send Header
                    asio::streambuf buf;
                    std::ostream(&buf) << name << " " << size << "\r\n";
                    write(sock, buf);
    
                    // Send bytes
                    std::vector<char> vec(size);
                    std::ifstream(name, std::ios::binary).read(vec.data(), size);
                    write(sock, asio::buffer(vec));
                }
            }
        }
    
        if (client) {
            path output_dir = "./output/";
            create_directories(output_dir);
    
            tcp::socket sock(io);
            // connect(sock, tcp::resolver(io).resolve("127.0.0.1", std::to_string(PORT)));
            sock.connect({{}, PORT});
    
            asio::streambuf buf;
    
            for (boost::system::error_code ec;;) {
                read_until(sock, buf, "\r\n", ec);
    
                path   name;
                size_t size;
                if ((std::istream(&buf) >> name >> size).ignore(1024, '\n')) {
                    std::cout << name << ": " << size << '\n';
    
                    read(sock, buf, asio::transfer_exactly(size), ec);
    
                    if (buf.size() < size) {
                        std::cerr << "Incomplete data" << std::endl;
                        break;
                    }
                    std::ofstream(output_dir / name, std::ios::binary /*| std::ios::app*/)
                        .write(buffer_cast<char const*>(buf.data()), size);
    
                    buf.consume(size);
                } else {
                    std::cerr << "Error receiving header, header invalid?" << std::endl;
                    break;
                }
            }
        }
    } catch (std::exception const& e) {
        std::cerr << e.what() << std::endl;
        return 1;
    }
    

    With the live test of

    for a in {1..10}; do dd if=/dev/urandom bs=1 count=10 of=small-$a.txt; done 2>/dev/null
    g++ -std=c++2b -O2 -Wall -pedantic -pthread main.cpp
    ./a.out --server *.* &
    sleep 1; ./a.out --client
    kill %1
    md5sum {.,output}/*.* | sort
    

    Printed:

    Server Start
    "a.out": 187496
    "main.cpp": 2546
    "small-1.txt": 10
    "small-10.txt": 10
    "small-2.txt": 10
    "small-3.txt": 10
    "small-4.txt": 10
    "small-5.txt": 10
    "small-6.txt": 10
    "small-7.txt": 10
    "small-8.txt": 10
    "small-9.txt": 10
    Error receiving header, header invalid?
    024c40ee2e93ee2e6338567336094ba2  ./small-8.txt
    024c40ee2e93ee2e6338567336094ba2  output/small-8.txt
    164f873a00178eca1354b1a4a398bf0f  ./small-10.txt
    164f873a00178eca1354b1a4a398bf0f  output/small-10.txt
    2ff416d02ca7ea8db2b5cb489a63852d  ./small-6.txt
    2ff416d02ca7ea8db2b5cb489a63852d  output/small-6.txt
    4559b8844afe7d5090948e97a8cef8d8  ./small-7.txt
    4559b8844afe7d5090948e97a8cef8d8  output/small-7.txt
    6fd6eac47427bfda3fc5456afed99602  ./small-4.txt
    6fd6eac47427bfda3fc5456afed99602  output/small-4.txt
    76fa51d5d6f06b9c8483f5539cd5611b  ./a.out
    76fa51d5d6f06b9c8483f5539cd5611b  output/a.out
    8a114a62f0ad5e087d7b338eeebcadf1  ./small-1.txt
    8a114a62f0ad5e087d7b338eeebcadf1  output/small-1.txt
    b4f11b6ed8870d431c5ec579d12991c0  ./small-5.txt
    b4f11b6ed8870d431c5ec579d12991c0  output/small-5.txt
    e1f0f06f1226ff7c82f942684d22e100  ./small-3.txt
    e1f0f06f1226ff7c82f942684d22e100  output/small-3.txt
    ec3fabd7edd0870bcfa5bbcfc7f2c7ec  ./small-2.txt
    ec3fabd7edd0870bcfa5bbcfc7f2c7ec  output/small-2.txt
    f80c5bbe46af5f46e4d4bcb2b939bf38  ./main.cpp
    f80c5bbe46af5f46e4d4bcb2b939bf38  output/main.cpp
    ff225cbc0f536f8af6946261c4a6b3ec  ./small-9.txt
    ff225cbc0f536f8af6946261c4a6b3ec  output/small-9.txt
    

    BONUS

    Instead of doing text-based IO, consider sending binary file size and name information. See for examples: https://stackoverflow.com/search?tab=newest&q=user%3a85371%20endian%20file&searchOn=3

    UPDATE: Turns out bonus take removes ALL the complexity: no more streambuf, unbounded reads, partial success, parsing, error handling.

    Given

    using NetSize = boost::endian::big_uint64_t;
    using Header  = std::array<NetSize, 2>;
    

    You can now send evrything in one go:

    for (path name : args) {
        if (contains(opts, name))
            continue;
    
        std::ifstream     ifs(name, std::ios::binary);
        std::vector<char> vec(std::istreambuf_iterator<char>(ifs), {});
    
        auto   namestr = name.native();
        Header header{namestr.size(), vec.size()};
    
        write(sock,
              std::array{
                  asio::buffer(header),
                  asio::buffer(namestr),
                  asio::buffer(vec),
              });
    }
    

    And on the receiving side:

    for (std::string name, data;;) {
        Header header;
        read(sock, asio::buffer(header));
    
        auto [namelen, datalen] = header;
        name.resize(namelen);
        data.resize(datalen);
    
        read(sock, std::array{asio::buffer(name), asio::buffer(data)});
    
        std::ofstream(output_dir / name, std::ios::binary /*| std::ios::app*/) << data;
    }
    

    See it Live On Coliru


    ¹ because of how TCP packet delivery works; this is how all libraries behave