Search code examples
c++c++17stdvector

Efficient way to create an std::vector from a contiguous subset of another std::vector


I am currently receiving some data which is stored in an std::vector, and I am looking for an efficient way of dropping the first few indices of the data in this vector, which hold some header information, and storing the rest of the data in another vector. In other words my code looks something like this:

std::size_t len = 1000000000; //some big number
std::vector<std::uint8_t> input_data;
input_data.resize(len);

//the receive_data function takes a uint8_t pointer as input and writes in it len bytes of data
receive_data(input_data.data(), input_data.size()); 

std::size_t header_len=40;
std::vector<std::uint8_t> sliced_data;
sliced_data.resize(len-header_len);

//copy in a new vector all the data received, minus the header
std::copy(input_data.begin()+header_len, input_data.begin()+len, sliced_data.begin());

The problem with the above code is that it does an expensive memcopy in order to remove the tiny header and copy the remaining large chunk of data in sliced_data, which seems unnecessarily slow to me. Is there a way to create the new vector sliced_data, that contains all the data in input_data minus the header?
Is there an efficient way of doing this using C++17? Can std::move do this for example?
I know this would be trivial if I was dealing with uint8_t pointers, however in my case the sliced data needs to be stored in a new vector, where the first index points at the data in index header_len of input_data.

Thanks


Solution

  • It's not possible with std::vector. A std::vector owns the data and manages it exclusively. Initializing it with some external data will always require a copy.

    Also note that std::move is not relevant here:
    If you move from input_data itself, the whole data will be moved.
    And since the elements themselves are uint8_t moving the elements (e.g. using std::move_iterator) is the same as copying so that's also not helpful.

    But if you have access to C++20, you can use std::span which is a lightweight non-owning class.
    It does not require any copy when initialized from input_data:

    #include <span>
    //...
    std::span<std::uint8_t> sliced_data(input_data.begin() + header_len, len-header_len);
    

    You will need to update any method that currently accepts sliced_data, but it might be worth it performance-wise.

    If you are limited to C++17, you might be able to use std::string_view which is similar in concept (but it's oriented to strings and so it's a bit convoluted to use it for uint8_t data).