Search code examples
regexqtqregularexpression

QRegularExpression - How to extract string from between two <ca> tags?


I am trying to get the text within multiple tags as following:

Text File:

Internal Auto-Configured Settings File
________________________________________
(( Do not attempt to edit it manually ))
________________________________________
# Saved certifications:
<ca>
Text which I want to extract
</ca>
...
<cert>Another text I want to extract</cert>
...

In my code I open the previous file and read its content & store it into a QString. So far I've done the following without any success:

QRegularExpression regex("<ca>(.*)</ca>", QRegularExpression::MultilineOption);
QRegularExpressionMatch match = regex.match(content);
QString ca = match.captured(1);

qDebug() << ca;
qDebug() << "\n\nDone!!";

<< Also did the same for <cert> but I get an empty string for both.


Solution

  • Instead of QRegularExpression::MultilineOption, use QRegularExpression::DotMatchesEverythingOption. The problem is due to the fact that . doesn't match new line character in default mode.

    Citing the documentation:

    The dot metacharacter (.) in the pattern string is allowed to match any character in the subject string, including newlines (normally, the dot does not match newlines). This option corresponds to the /s modifier in Perl regular expressions.

    Make sure that </ca> only appear once in the input.

    If that is not the case, modify your expression a bit:

    "<ca>(.*?)</ca>"
    

    This makes the quantifier lazy (instead of the default greedy), and causes it to match the closest closing tag </ca>.