i am trying to scrape a web page, here is the JS code:
const axios = require('axios');
const cheerio = require('cheerio');
const r = 459230;
const link = 'https://www.discogs.com/sell/release/';
const link_completo = link + r.toString();
const headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0',
'Referer': 'http://www.discogs.com'
};
console.log(link_completo);
axios.get(link_completo, { headers })
.then((response) => {
const $ = cheerio.load(response.data);
const label = $('td a').text().trim()
console.log('Label:', label);
});
here is the HTML:
<td><a href="/label/2564-Harvest" hreflang="en" class="link_1ctor">Harvest</a> – SHVL 767<!-- -->, <a href="/label/2564-Harvest" hreflang="en" class="link_1ctor">Harvest</a> – 1E 062 o 90749</td>
i am trying with:
const label = $('td a').text().trim()
i get an incorrect response
Label: Harvest – SHVL 767, Harvest – 1E 062 o 90749
here is the desired output
The issue with your code is the CSS selector $('td a')
. This selector is too broad and selects multiple elements on the page, rather than just the specific label you're interested in. Consequently, the text()
function concatenates the text of all these elements together, which is likely not the desired outcome.
To accurately target the label information, you need to make the CSS selector more specific, taking into account the structure of the webpage. Upon inspecting the page, it appears that the label information is contained within a div with the class profile, and the label itself is inside a div with the class content.
<div class="content">
<a href="/label/2564-Harvest">Harvest</a> – SHVL 767,
<a href="/label/2564-Harvest">Harvest</a> – 1E 062○90749
</div>
Here's the updated code:
const axios = require('axios');
const cheerio = require('cheerio');
const releaseId = 459230;
const url = `https://www.discogs.com/sell/release/${releaseId}`;
const headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0',
'Referer': 'http://www.discogs.com'
};
console.log(url);
axios.get(url, { headers })
.then((response) => {
const $ = cheerio.load(response.data);
const label = $("div.profile div.content").first().text().trim();
console.log('Label:', label);
});
This updated script selects the first div with the class content that is inside a div with the class profile. It then retrieves the text content of the label and logs it to the console.
By making these changes, you'll be able to accurately target the label information you're looking for.