Search code examples
c++qtqstringqregexp

restore runtime unicode strings


I'm building an application that receives runtime strings with encoded unicode via tcp, an example string would be "\u7cfb\u8eca\u4e21\uff1a\u6771\u5317 ...". I have the following but unfortunately I can only benefit from it at compile time due to: incomplete universal character name \u since its expecting 4 hexadecimal characters at compile time.

QString restoreUnicode(QString strText)
   {
      QRegExp rx("\\\\u([0-9a-z]){4}");
      return strText.replace(rx, QString::fromUtf8("\u\\1"));
   }

I'm seeking a solution at runtime, I could I foreseen break up these strings and do some manipulation to convert those hexadecimals after the "\u" delimiters into base 10 and then pass them into the constructor of a QChar but I'm looking for a better way if one exists as I am very concerned about the time complexity incurred by such a method and am not an expert.

Does anyone have any solutions or tips.


Solution

  • For closure and anyone who comes across this thread in future, here is my initial solution before optimising the scope of these variables. Not a fan of it but it works given the unpredictable nature of unicode and/or ascii in the stream of which I have no control over (client only), whilst Unicode presence is low, it is good to handle it instead of ugly \u1234 etc.

    QString restoreUnicode(QString strText)
    {
        QRegExp rxUnicode("\\\\u([0-9a-z]){4}");
    
        bool bSuccessFlag;
        int iSafetyOffset = 0;
        int iNeedle = strText.indexOf(rxUnicode, iSafetyOffset);
    
        while (iNeedle != -1)
        {
            QChar cCodePoint(strText.mid(iNeedle + 2, 4).toInt(&bSuccessFlag, 16));
    
            if ( bSuccessFlag )
                strText = strText.replace(strText.mid(iNeedle, 6), QString(cCodePoint));
            else
                iSafetyOffset = iNeedle + 1; // hop over non code point to avoid lock
    
            iNeedle = strText.indexOf(rxUnicode, iSafetyOffset);
        }
    
        return strText;
    }