Search code examples
qtarabicqstringqregexp

Regexp arabic text paragraph


Given string:

QString unformatted =
   "Some non arabic text"
   "بعض النصوص العربية"
   "Another non arabic text"
   "النص العربي آخر";

How to reach following result using QRegExp or other way:

"<p>Some non arabic text</p>"
"<p dir='rtl'>بعض النصوص العربية</p>"
"<p>Another non arabic text</p>"
"<p dir='rtl'>النص العربي آخر</p>";

Thanks!


Solution

  • Function to separate by arabic expressions:

    QString split_arabic(QString text){
        QRegExp rx("[\u0600-\u065F\u066A-\u06EF\u06FA-\u06FF][ \u0600-\u065F\u066A-\u06EF\u06FA-\u06FF]+");
        int pos = 0;
    
    
        QStringList list;
    
        while ((pos = rx.indexIn(text, pos)) != -1) {
            list << rx.cap(0);
            pos += rx.matchedLength();
        }
    
        for(int i=0; i < list.length(); i++){
            QString str = list.at(i);
            text.replace(str, "<p dir='rtl'>"+str+"</p>");
        }
    
        return text;
    }
    

    Example:

    QString unformatted =
                "Some non arabic text"
                "بعض النصوص العربية"
                "Another non arabic text"
                "النص العربي آخر";
    
    
    qDebug()<<unformatted;
    qDebug()<<split_arabic(unformatted);
    

    Output:

    "Some non arabic textبعض النصوص العربيةAnother non arabic textالنص العربي آخر"
    "Some non arabic text<p dir='rtl'>بعض النصوص العربية</p>Another non arabic text<p dir='rtl'>النص العربي آخر</p>"