I'm using cheerio (cheeriojs) to scrape content from a site which has the following HTML layout.
<div class="foo"></div>
<p></p>
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
<br><br>
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
<br><br>
</p>
I'm able to reach this content using the each function in the docs (here) by traversing the DOM looking for ".foo" class like so.
$('.foo').each(function(i, el){
//Do something...
$(this).next().next().text()
}
From here I'm able to simply convert this content to a string, and retrieve as I wish, however the text comes back in one unformatted long string. (i.e. a long essay of paragraphs without spacing between the respective paragraphs). Is there a way, trick I could retrieve the content whilst keeping the formatted content?
I've attempted the following;
`var fruits = [];
$('.foo').each(function(i, el){
fruits[i] = $(this).next().next().text();
}`
As a way to get the current tag and push it to an array, however this isn't much different from my earlier code. I'm assuming this would be possible if the <br>
tags had some id or classes, however they don't. Is there a way I can directly target these (<br>
) as a way to get the text, and retrieve it in proper format (i.e. with spacing between paragraphs). At this junction, I must ask those who are more familiar and experience with cheerio if what I'm trying to do in this particular cash is even feasible with cheerio? I'm open to pursuing other route, and would welcome recommendation for modules/libraries that could help make this an easier task.
To recap: I want to retrieve all text between the second <p>
tags, maintaining format and spacing as seen on rendered HTML.
Thanks in advance.
If you ask for .text()
it will strip formatting. If you ask for .html()
it'll return all the content, preserving all the tags.
So change this:
fruits[i] = $(this).next().next().text();
To this:
fruits[i] = $(this).next().next().html();