Search code examples
qtqregexpqt5.6

QRegExp does not match even though regex101.com does


I need to extract some data from string with simple syntax. The syntax is this:

_IMPORT:[any text] - [HEX number] #[decimal number]

Therefore I created regex you can see below in the code:

 //SYNTAX:  _IMPORT:%1 - %2 #%3
 static const QRegExp matchImportLink("^_IMPORT:(.*?) - ([A-Fa-f0-9]+) #([0-9]+)$");
 QRegExp importLink(matchImportLink);
 QString qtWtf(importLink.pattern());
 const int index = importLink.indexIn(mappingName);

 qDebug()<< "Input string: "<<mappingName;
 qDebug()<< "Regular expression:"<<qtWtf;
 qDebug()<< "Result: "<< index;

For some reason, that does not work, I get this output:

Input string:  "_IMPORT:ddd - 92806f0f96a6dea91c37244128f7d00f #0"
Regular expression: "^_IMPORT:(.*?) - ([A-Fa-f0-9]+) #([0-9]+)$"
Result:  -1

I even tried to remove the anchors ^ and $ but that didn't help and also is undesired. The annoying thing is that this regexp works perfectly if I copy the output in regex101.com, as you can see here: https://regex101.com/r/oT6cY3/1

Can anyone explain what is wrong here? Did I stumble upon Qt bug? I use Qt 5.6. Is there any workaround for this?


Solution

  • It seems like Qt does not recognize the quatifier *? as valid. Check the method QRegExp::isValid() againts your pattern. In my case it did not work because of this. And the documentation tells that any invalid pattern will never match.

    So first thing I tried was skipping the ? which perfectly fits your provided string with all capturing groups. Here is my code.

    QString str("_IMPORT:ddd - 92806f0f96a6dea91c37244128f7d00f #0");
    QRegExp exp("^_IMPORT:(.*) - ([A-Fa-f0-9]+) #([0-9]+)$");
    
    qDebug() << "pattern:" << exp.pattern();
    qDebug() << "valid:" << exp.isValid();
    int pos = 0;
    while ((pos = exp.indexIn(str, pos)) != -1) {
        for (int i = 1; i <= exp.captureCount(); ++i)
            qDebug() << "pos:" << pos << "len:" << exp.matchedLength() << "val:" << exp.cap(i);
        pos += exp.matchedLength();
    }
    

    And here is the resulting output.

    pattern: "^_IMPORT:(.*) - ([A-Fa-f0-9]+) #([0-9]+)$"
    valid: true
    pos: 0 len: 49 val: "ddd"
    pos: 0 len: 49 val: "92806f0f96a6dea91c37244128f7d00f"
    pos: 0 len: 49 val: "0"
    

    Tested using Qt 5.6.1.

    Also note that you may set greedy evaluation using QRegExp::setMinimal(bool).