I am trying to parse these strings using regexp but i think my regexp statement is incorrect.
I am given a string and whenever it has a newline and carriage return, they must be separated in two strings. The strings has this following format
[CharSize][Inverted][Aligned]Data
For example if I am given data like this
QString s1 = "[6][1][0]Data1\n\r[5][0][1]Data2";
When I separate them there would be two QString and I will take the data inside the open and closed brackets.
Another valid data is like this:
QString s2 = "[7][0][1]Data3"
Invalid data is like this:
QString s3 = "abc[8][1][1]Data4"
I applied the following QRegExp below:
QRegExp clrf("\n\r|\r\n|\n");
QStringList sp = str.split(clrf);
QRegexp clrf works fine for separating data with newline and carriage return, reversed or just newline. Note: s1, s2 and s3 are processed fine here.
The problem is here:
QRegExp value("[^a-z]?\\[([0-9a-z]+)\\]\\[([0-9a-z]+)\\]\\[([0-9a-z]+)\\]([A-Za-z0-9\\'\\ \"]*)");
When I use the above code, s1, s2 and s3 are all processed. S3 SHOULD NOT be processed since its first character is not the open bracket. Can you help me correct my QRegExp?
Thank you.
EDIT: Entire Code:
void parseString(QString str)
{
QRegExp clrf("\n\r|\r\n|\n");
QRegExp value("\\[([0-9a-z]+)\\]\\[([0-9a-z]+)\\]\\[([0-9a-z]+)\\]([A-Za-z0-9\\'\\ \"]*)");
// QRegExp value("^\[(\\d+)\]\[(\\d+)\]\[(\\d+)\](.*)$");
int p = 0, i = 0;
int res;
int cs = 0, inv = 0, al = 0;
QStringList sp = str.split(clrf);
XLineString ls;
for (i = 0; i < sp.size(); ++i) {
res = value.indexIn(sp[i], p);
while (res != -1) {
printf("Text=[%s]\n", value.cap(EData).toStdString().c_str());
printf("Digit cs[%d] ", value.cap(ECharSize).toInt());
printf("inv[%d] ", value.cap(EInvert).toInt());
printf("al[%d]\n", value.cap(EAlignment).toInt());
cs = value.cap(ECharSize).toInt();
if (value.cap(EInvert).toInt())
inv = 1;
else
inv = 0;
if (value.cap(EAlignment).toInt())
al = 1;
else
al = 0;
ls.addLine(value.cap(EData).toStdString().c_str(), cs, inv, al);
p += value.matchedLength();
res = value.indexIn(str, p);
}
}
}
int main()
{
QString str1[] = {
"[12][0][0]DATA1\n\r[78][0][1]DATA2",
"abc[1][1][1]THIS SHOULD NOT PASS",
};
for (int i = 0; i < sizeof(str1) / sizeof(str1[0]); ++i)
parseString(str1[i]);
}
To answer your question I've tested this out with PyQt5 (for simplicity)
import re
R = ["\\[([0-9a-z]+)\\]\\[([0-9a-z]+)\\]\\[([0-9a-z]+)\\]([A-Za-z0-9\\'\\ \"]*)", "^\[(\\d+)\]\[(\\d+)\]\[(\\d+)\](.*)$"]
tests = ["[6][1][0]Data1\n\r[5][0][1]Data2", "[7][0][1]Data3", "abc[8][1][1]Data4"]
s = re.compile("\n\r|\r\n|\n") # emulate QRegExp split feature
for r in R:
r=QRegExp(r)
for T in tests:
for t in s.split(T):
print(r.indexIn(t))
The results:
0
0
0
3
0
0
0
-1
Conclusion: Your original match can also work if you simply test that "indexIn" is equal to 0 rather than not equal to -1, while my modified version should work either way.
I think your best bet is simply to modify your code making sure the index is equal to 0. I might also suggest that you use an If statement rather than a While statement, or you may repeatedly match the same line.