Search code examples
c++qtc++20stdstringqt6

What is the most efficent way to convert a QStringView to a std::string?


I have a function that gets an argument as QStringView. Among other things, this must be converted into a std::string. QString has the function toStdString() for this. Is there a way to avoid the potentially expensive intermediate step via QString?


Solution

  • Converting between character encodings is hard to get right. QString stores 16 bit UTF-16. toStdString reencodes this as utf-8 using an intermediate byte array using toUtf8.

    QStringView also has toUtf8 that returns a QByteArray and has the same guarantees as QString.

    std::string toStdString( QStringView view ) {
      auto bytes = view.toUtf8(); // allocates and converts.
      return {bytes.constData(), bytes.length()}; // copies to a std::string
    }
    

    this drops 1 memory copy compared to the naive

    std::string toStdString( QStringView view ) {
      return QString(view).toStdString();
    }
    

    there is 1 intermediate memory copy that can, in theory, be removed as well; you could directly convert from UTF16 QStringView data to a buffer in a std::string.

    std::string toStdString( QStringView view ) {
      auto toUtf8 = QStringEncoder(QStringEncoder::Utf8);
      auto space = toUtf8.requiredSpace(view.length());
      std::string retval;
      // make a string of all nulls:
      retval.resize(space+1); // +1 probably not needed
      // Ideally use C++23's `resize_and_overwrite`
      // instead of `resize` above.  Without that, on large (>1k)
      // strings other solutions are faster.
    
      // output the UTF8 into the std::string:
      char* end = toUtf8.appendToBuffer(retval.data(), view);
      // Strip the nulls logically from the returned string:
      retval.resize(end-retval.data());
      return retval;
    }
    

    this is an attempt to avoid that intermediate buffer allocation. It may be incorrect and have bugs, but the design is sound.

    In theory an even crazier system that works out the actual space required for the UTF16 before (or as you) make the output buffer could be possible.

    (resize_and_overwrite optimization added because of @BenjaminBuch excellent analysis in another answer).