Search code examples
c++qtqstringqnetworkaccessmanager

I can't find a substring in a string


I'm trying to parse data from the site using QNetworkAccessManager. To do this, I write the site data to QString, but when I do a substring search using indexOf, the result is incorrect: one value is -1. Tell me, please, what is the problem here?

void MainWindow::on_pushButton_clicked()
{
     pBar = new QProgressBar;
     pBar->setMaximum(0); // максимум
     pBar->setMinimum(0);
     pBar->show();

     QNetworkRequest request;
     QUrl url(tr("https://auto.ru/tver/cars/all/? utm_source=yandex_direct&utm_medium=direct.brand&utm_campaign=460_hand_desktop_used_brand_search_Tver_none_82222146&utm_content=cid%3A82222146%7Cgid%3A5114476686%7Caid%3A13330832990%7Cph%3A42898716904%7Cpt%3Apremium%7Cpn%3A1%7Csrc%3Anone%7Cst%3Asearch%7Ccgcid%3A0%7Cdt%3Adesktop&utm_term=auto+ru&adjust_t=cl4qttt_nsw4it6&adjust_campaign=82222146&adjust_adgroup=5114476686&tracker_limit=10000&adjust_ya_click_id=1049526807999603789&_openstat=ZGlyZWN0LnlhbmRleC5ydTs4MjIyMjE0NjsxMzMzMDgzMjk5MDt5YW5kZXgucnU6cHJlbWl1bQ&yclid=639777327900000255"));

     request.setUrl(url);

     this->manager->get(request);

     connect(manager, SIGNAL(finished(QNetworkReply *)), this, 
             SLOT(replyFinished(QNetworkReply *)));
  }

  void MainWindow::replyFinished(QNetworkReply *reply)
  {
      if (reply->error() == QNetworkReply::NoError)
      {
          QByteArray content= reply->readAll();
          QTextCodec *codec = QTextCodec::codecForName("utf8");
          QString page = codec->toUnicode(content.data());
          int startStrPos = page.indexOf("<div class=\"ListingCars 
                                     ListingCars_outputType_list\">");
          int endStrPos   = page.lastIndexOf("<div class=\"ListingCarsPagination\">");
  
           
          // startStrPos = -1
          qDebug() << startStrPos << endStrPos;

          QString ctn = page.mid(startStrPos, endStrPos - startStrPos);
          ui->textEdit->setPlainText(ctn);
     }
     reply->deleteLater();
     pBar->close();
  }

Solution

  • The string you search for is spanning 2 source lines, this is not correct. You should either use a single line:

          int startStrPos = page.indexOf("<div class=\"ListingCars ListingCars_outputType_list\">");
    

    Or you should use string split the string in 2 parts for readability:

          int startStrPos = page.indexOf("<div class=\"ListingCars" 
                                         " ListingCars_outputType_list\">");
    

    Using this technique, you can make the URL somewhat more readable:

         QUrl url(tr("https://auto.ru/tver/cars/all/?"
                     "utm_source=yandex_direct&"
                     "utm_medium=direct.brand&"
                     "utm_campaign=460_hand_desktop_used_brand_search_Tver_none_82222146&"
                     "utm_content=cid%3A82222146%7Cgid%3A5114476686%7Caid%3A13330832990%7Cph%3A42898716904%7Cpt%3Apremium%7Cpn%3A1%7Csrc%3Anone%7Cst%3Asearch%7Ccgcid%3A0%7Cdt%3Adesktop&"
                     "utm_term=auto+ru&"
                     "adjust_t=cl4qttt_nsw4it6&"
                     "adjust_campaign=82222146&"
                     "adjust_adgroup=5114476686&"
                     "tracker_limit=10000&"
                     "adjust_ya_click_id=1049526807999603789&"
                     "_openstat=ZGlyZWN0LnlhbmRleC5ydTs4MjIyMjE0NjsxMzMzMDgzMjk5MDt5YW5kZXgucnU6cHJlbWl1bQ&"
                     "yclid=639777327900000255"));
    

    Note however that the indexOf method scans for an exact match of the substring. The html file might contain a variation of the substring with different spacing or extra attributes.

    EDIT:

    Requesting the page manually, I got a cookie screen and a robot detection page... There is a chance you are not parsing the final page. Furthermore, the string <div class="ListingCars ListingCars_outputType_list"> does not appear in the page I finally get. Did the page structure change since you last analyzed it?