I use the Cheeriogs library for scraping:
https://github.com/tani/cheeriogs
This is the element I need to collect the value href
:
<a class="tnmscn" itemprop="url" href="/en/predictions-tips-wealdstone-solihull-moors-1455115">
This is the code I'm currently using to extract the value.:
const contentText = UrlFetchApp.fetch(url).getContentText();
const $ = Cheerio.load(contentText);
const scrapurl = $('div.schema > div > div.tnms > div > a.tnmscn');
const urlmatch = $(scrapurl).attr('href').trim();
Logger.log(urlmatch);
But it's not reliable for my fear of ending up changing positions on the site and collecting other links other than the one that appears in the clickable element in that position:
So I'd like to make it more secure, so I tried using:
div.schema > div > div.tnms > div > a:contains("/en/predictions-tips")
That didn't work. How should I use contains
for this need?
Add infos:
Page Link
https://www.forebet.com/en/teams/wealdstone
Image to element
In your situation, how about the following selectors?
const scrapurl = $('div.schema > div > div.tnms > div > a.tnmscn');
const scrapurl = $('a.tnmscn[href^="/en/predictions"]');
or
const scrapurl = $('div.schema > div > div.tnms > div > a.tnmscn[href^="/en/predictions"]');
or
const scrapurl = $('div.schema > div > div.tnms > div > a[href^="/en/predictions"]');
/en/predictions-tips-wealdstone-solihull-moors-1455115
is retrieved.href
in the tag a
and the tag a
with the class tnmscn
is /en/predictions
.But, from the URL you are using, 2 values are retrieved. This has already been mentioned by Granitosaurus's comment. So I think that when you want to retrieve the 1st value, the above modification for your script can be used.
If you want to retrieve 2 values, how about the following modification?
In this modification, the above modified selectors can be also used.
const url = "https://www.forebet.com/en/teams/wealdstone";
const contentText = UrlFetchApp.fetch(url).getContentText();
const $ = Cheerio.load(contentText);
const scrapurl = $('div.schema > div > div.tnms > div > a.tnmscn[href^="/en/predictions"]'); // and a.tnmscn[href^="/en/predictions"]
$(scrapurl).each(function() {
const urlmatch = $(this).attr('href');
console.log(urlmatch);
});
When this script is run, the following result is obtained.
/en/predictions-tips-wealdstone-solihull-moors-1455115
/en/predictions-tips-crawley-town-leyton-orient-1474259