Search code examples
c++performancestdstringzero-copy

Can I do a zero-copy std::string allocation in C++ from a const char * array?


Profiling of my application reveals that it is spending nearly 5% of CPU time in string allocation. In many, many places I am making C++ std::string objects from a 64MB char buffer. The thing is, the buffer never changes during the running of the program. My analysis of std::string(const char *buf,size_t buflen) calls is that that the string is being copied because the buffer might change after the string is made. That isn't the problem here. Is there a way around this problem?

EDIT: I am working with binary data, so I can't just pass around char *s. Besides, then I would have a substantial overhead from always scanning for the NULL, which the std::string avoids.


Solution

  • Binary data? Stop using std::string and use std::vector<char>. But that won't fix your issue of it being copied. From your description, if this huge 64MB buffer will never change, you truly shouldn't be using std::string or std::vector<char>, either one isn't a good idea. You really ought to be passing around a const char* pointer (const uint8_t* would be more descriptive of binary data but under the covers it's the same thing, neglecting sign issues). Pass around both the pointer and a size_t length of it, or pass the pointer with another 'end' pointer. If you don't like passing around separate discrete variables (a pointer and the buffer’s length), make a struct to describe the buffer & have everyone use those instead:

    struct binbuf_desc {
        uint8_t* addr;
        size_t len;
        binbuf_desc(addr,len) : addr(addr), len(len) {}
    }
    

    You can always refer to your 64MB buffer (or any other buffer of any size) by using binbuf_desc objects. Note that binbuf_desc objects don’t own the buffer (or a copy of it), they’re just a descriptor of it, so you can just pass those around everywhere without having to worry about binbuf_desc’s making unnecessary copies of the buffer.