I'm trying to get the specific text strings below as separated outputs e.g. (scrape them from the HTML below):
let text = "Thats the first text I need";
let text2 = "The second text I need";
let text3 = "The third text I need";
I really don't know how to get a text that's separated by different HTML tags.
<p>
<span class="hidden-text"><span class="ft-semi">Count:</span>31<br></span>
<span class="ft-semi">Something:</span> That's the first text I need
<span class="hidden-text"><span class="ft-semi">Something2:</span> </span>The second text I need
<br><span class="ft-semi">Something3:</span> The third text I need
</p>
You can iterate the child nodes of the <p>
and grab any nodeType === Node.TEXT_NODE
s that have nonempty content:
for (const e of document.querySelector("p").childNodes) {
if (e.nodeType === Node.TEXT_NODE && e.textContent.trim()) {
console.log(e.textContent.trim());
}
}
// or to make an array:
const result = [...document.querySelector("p").childNodes]
.filter(e =>
e.nodeType === Node.TEXT_NODE && e.textContent.trim()
)
.map(e => e.textContent.trim());
console.log(result);
<p>
<span class="hidden-text">
<span class="ft-semi">Count:</span>
31
<br>
</span>
<span class="ft-semi">Something:</span>
That's the first text I need
<span class="hidden-text">
<span class="ft-semi">Something2:</span>
</span>
The second text I need
<br>
<span class="ft-semi">Something3:</span>
The third text I need
</p>
In Cheerio:
const cheerio = require("cheerio"); // 1.0.0-rc.12
const html = `
<p>
<span class="hidden-text">
<span class="ft-semi">Count:</span>
31
<br>
</span>
<span class="ft-semi">Something:</span>
That's the first text I need
<span class="hidden-text">
<span class="ft-semi">Something2:</span>
</span>
The second text I need
<br>
<span class="ft-semi">Something3:</span>
The third text I need
</p>
`;
const $ = cheerio.load(html);
const result = [...$("p").contents()]
.filter(e => e.type === "text" && $(e).text().trim())
.map(e => $(e).text().trim());
console.log(result);